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Abstract 

We  consider  computing  the  singular  value  decomposition  of  a  bidiagonal  matrix  B. 
This  problem  arises  in  the  singular  value  decomposition  of  a  general  matrix,  and  in 
the  eigenproblem  for  a  symmetric  positive  definite  tridiagonal  matrLX.  We  show  that  if 
the  entries  of  B  are  known  with  high  relative  accuracy,  the  singular  values  and  singular 
vectors  of  B  will  be  determined  to  much  higher  accuracy  than  the  standard  perturbation 
theory  suggests.  We  also  show  that  the  algorithm  in  [Demmel  and  Kahan]  computes 
the  singular  vectors  as  well  as  the  singular  values  to  this  accuracy.  We  also  give  a 
Hamiltonian  interpretation  of  the  algorithm  and  use  differential  equation  methods  to 
prove  many  of  the  basic  facts.  The  Hamiltonian  approach  suggests  a  way  to  use  flows 
to  predict  the  accumulation  of  error  in  other  eigenvalue  algorithms  as  well. 


1     Introduction 

The  singular  value  decomposition  (SVD)  of  a  real  general  n  by  n  matrix  A  is  the  factorization 
A  =  VEV^ ,  where  [/  and  V  are  orthogonal,  E  =  diag  {cti,  . .  .,<7„},  and  (Ti  >  ■  ■  ■  >  Cn  >  0. 
The  (T,'s  are  the  singular  values  of  A,  the  columns  v,  of  V  the  rig/i?  5mpu/ar  vectors  of  .4, 
and  the  columns  u,  of  [/  the  left  singular  vectors  of  /I. 

In  this  paper  we  will  consider  the  SVD  of  a  bidiagonal  matrix  B 


B  = 


ai      6i 


6n-l 


(i.i: 


where  we  may  assume  without  loss  of  generality  that  the  a,  and  b,  are  positive.  (Recall 
that  this  assumption  implies  that  all  the  a,  are  positive  and  distinct  [ParSO].)  This  problem 
arises  as  the  final  stage  of  the  SVD  of  a  general  matrix  .4  [GK65,GVL83],  as  well  as  in 
computing  the  eigendecomposition  of  a  symmetric  positive-definite  tridiagonal  matrix  T 
[BD88].  Both  computations  arise  frequently  in  a  wide  variety  of  applications.  Our  goal  in 
this  paper  is  threefold:  to  show  that  the  SVD  of  a  bidiagonal  B  can  be  computed  much 
more  accurately  than  the  SVD  of  a  general  matrix  A,  to  explain  this  with  the  aid  of  a 
Hamiltonian  differential  equation  underlying  the  SVD  algorithm  used  for  B,  and  to  suggest 
using  similar  differential  equations  to  find  high  accuracy  algorithms  for  other  eigenvalue 
and  singular  value  problems. 

How  accurately  can  the  SVD  of  a  general  matrix  A  be  computed?  To  answer  this 
question,  we  must  consider  both  the  effects  of  uncertainty  in  the  initial  data  A,  as  well  as 
errors  introduced  by  an  algorithm  (roundoff  errors  and  the  effects  of  the  stopping  criterion); 
a  good  algorithm  introduces  errors  no  worse  than  inherent  errors  caused  by  uncertainty  in 
the  data.  The  standard  approach  is  to  bound  the  uncertainty  6 A  in  the  initial  data  by  its 
two-norm  \\SA\\:  we  say  that  6 A  is  an  absolute  error  of  scale  t)  in  A  if  ||<5.4||  /  ||^||  <  rj.  With 
this  definition  of  rj  we  have: 

(A)  -  The  singular  values  a,  of  A  and  a[  of  A  +  6A  can  differ  by  at  most  t]\\A\\  for  all  i 

[GVL83]: 

k.-'^:i<'?Pii  (1.2) 

(B)  -  Let  u,  and  u[  be  the  t-th  left  unit  singular  vectors  of  A  and  .4  -f-  SA  respectively,  and 

V,  and  u,'  be  the  right  unit  singular  vectors.  Let  the  absolute  gap  for  a,  be  defined  as 
the  distance  from  a,  to  the  nearest  different  singular  value,  normalized  by  (Tj: 


absgap,  =  min_,^, |(7,  -  aj\/ai 


(1.3) 


If  Tj  <  absgapi/2,  then  the  sines  of  the  angles  between  u,  and  uj  (sin5(u,,  uj)),  and 
between  i»,  and  u,'  (sin^(t;,,  i;,')),  are  bounded  as  follows[DK70]: 


max(sin  ^(u,,u^),sin5(v,,  V,')  < 


absgap,  -  tj 


(1.4) 


The  reason  for  limiting  rj  <  absgap,/2  is  that  for  larger  perturbations  the  bound  (1.4) 
is  vacuous.  This  reflects  the  fact  that  larger  perturbations  by  a  general  matrix  can 
cause  (7i  to  merge  with  another  singular  value,  so  that  the  singular  vectors  fail  to  be 
uniquely  defined. 

Now  we  consider  the  errors  introduced  by  the  standard  aJgorithm  [GK65]  for  the  SVD 
of  a  general  matrix.  We  assume  a  standard  model  of  floating  point  arithmetic,  with  relative 
errors  of  size  at  most  e,  the  machine  precision,  in  each  basic  operation,  and  assume  neither 
overflow  nor  underflow  occur.  Under  these  assumptions  it  is  well  known  [GVLSS]  that  the 
error  bound  of  the  standard  algorithm  is  equivalent  to  an  uncertainty  6A  in  the  initial 
data  with  absolute  error  scale  \\6A\\  /  \\A\\  <  pin)e,  where  p{n)  depends  on  details  of  the 
arithmetic  and  algorithm,  and  is  a  modestly  growing  function  of  the  dimension  n  of  .4  (a 
cubic  polynomial  in  n  with  modest  coefficients).  In  other  words,  the  error  introduced  by 
the  algorithm  is  no  worse  than  p(n)£  \\A\\  =  p{n)£ai  in  th^  singular  values  and  no  worse 
than  p(n)£/absgap,  in  the  singular  vectors. 

In  particular,  suppose  A  has  one  or  more  tiny  singular  values  (?_,,  where  by  tiny  we  mean 
(jj  <  cTi  =  ||v4||.  The  error  bound  p{n)scri  for  the  singular  values  implif^^  *hat  while  large 
singular  values  may  be  computed  with  high  relative  accuracy,  tiny  ones  will  in  general  not 
be,  since  the  error  bound  may  greatly  exceed  their  value.  Also,  if  there  are  two  or  more 
tiny  singular  values,  their  absolute  gaps  will  necessarily  be  small  compared  to  a\,  and  their 
singular  vectors  correspondingly  uncertain.  Indeed,  these  uncertainties  are  unavoidable  as 
long  as  one  considers  general  perturbations  6 A  bounded  only  in  norm,  because  the  bounds 
in  (A)  and  (B)  above  are  attainable  [Wil65]. 

Now  we  return  to  the  case  of  a  bidiagonal  matrix  B.  It  turns  out  that  the  effects  of 
both  initial  data  uncertainties  and  roundoff  errors  are  significantly  smaller  than  for  general 
matrices,  and  are  controlled  by  the  relative  error  of  6B 

T/,  =  (2n-l).max|log:^l^±^^|  (1.5) 

',:  B,,j 

instead  of  by  its  norm  \\6B\\.  When  r/,  <  1,  this  means  that  the  sum  of  the  componentwise 

SB 

relative  errors  XZ,,j  \~b7^\  's  approximately  bounded  by  7]^.  In  other  words,  the  zero  entries 
of  B  must  remain  zero,  and  we  only  permit  relative  perturbations  in  the  remaining  entries, 
rather  than  norm  bounded  perturbations.  Using  rjr  as  measure  of  uncertainty  in  B,  one  can 
prove: 

(A')  -  Let  the  a,  be  the  singular  values  of  B  and  a,'  be  the  singular  values  of  B  +  6B.  Then 

g-nr  <  £l  <  e"'  (1.6) 

This  bound  holds  for  all  r;,.  >  0,  just  as  (1.2)  holds  for  all  t;  >  0.  When  rjr  <  1,  these 
upper  and  lower  bounds  on  ^  are  approximately  1  ±  r/^,  meaning  that  small  relative 
perturbations  in  the  entries  of  B  only  cause  small  relative  perturbations  in  the  a„ 
independent  of  their  magnitudes. 


(B')  -  For  simplicity  of  notation  write  e^'  =  1  +  r?^;  when  t]^  <C  1,  ^r  ~  ^r-  Let  u,  and 
Vi  be  the  singular  vectors  of  B,  and  let  u[  and  u-  be  the  singular  vectors  of  5  +  6B. 
Let  the  relative  gap  for  cr,  be  defined  as  the  relative  distance  from  <7i  to  the  nearest 
different  singular  vzilue: 

relgap,  =  min^^,\a,  -  ct_,|/|(7,  +  CTj|  (1.7) 

If  T/'  <  relgap,,  then  the  sines  of  the  angles  between  u,  and  u[,  and  between  r,  and 
v'-,  are  bounded  by 

2i/2n'(i  J.  „') 

max(sin^(u,,u;),sin^(u,,i;')  <  r^^ ^  (1.8) 

relgap,  -  77; 

Result  (A')  was  proven  in  [Kah68,DK88,BD88];  we  incln-i'^  a  (new)  proof  for  the  con- 
venience of  the  reader.  Result  (B')  is  proved  here  for  the  fir^i  lime;  it  was  conjectured  in 
[DK88]  with  a  proof  of  a  weaker  result  in  [BD88]. 

One  can  easily  see  that  results  (A')  and  (B')  are  always  at  least  about  as  strong  as 
their  counterparts  (A)  and  (B).  To  show  how  much  stronger  they  may  be,  consider  making 
relative  perturbations  of  size  10~'°  in  a  3  by  3  bidiagonaJ  matrix  with  singular  values 
Ci  =  1,  CTj  =  2  •  10~^°,  and  0-3  =  10"'^°.  Note  that  absgaps  =  absgap^  =  10"'^°,  and  that 
relgapz  =  relgap2  =  1/3.  Since  the  norm  of  this  perturbation  is  about  10~^°,  we  may  apply 
(A)  and  (B)  to  get  the  absolute  error  bound  10~^°  >  0-3  for  (73,  and,  since  10~'°  >  absgap^, 
no  error  bound  for  the  singular  vectors  at  all.  Applying  (A'),  we  get  a  relative  error  bound 
of  about  5  •  10~^°  in  0-3.  Thus,  we  have  at  least  9  decimal  digits  of  accuracy  in  173,  whereas 
(A)  predicts  changes  10^°  times  larger.  Applying  (B'),  we  get  an  error  bound  of  about 
2.1  •  10"^  in  the  direction  of  the  singular  vectors,  whereas  (B)  provides  no  error  bound  at 
all.  The  same  results  hold  for  02  and  its  singular  vectors. 

In  summary,  absolute  uncertainties  in  the  entries  of  a  general  matrix  .4  yield  absolute 
error  bounds  on  its  singular  values,  and  error  bounds  depending  on  the  absolute  gap  for 
its  singular  vectors.  In  contrast,  relative  uncertainties  in  the  entries  of  a  bidiagonal  matrix 
B  yield  relative  error  bounds  on  its  singular  values,  and  error  bounds  depending  on  the 
relative  gap  for  its  singular  vectors. 

Given  the  much  greater  accuracy  to  which  singular  values  and  singular  vectors  of  bidi- 
agonal matrices  are  determined  by  the  data,  it  is  desirable  to  have  an  algorithm  which 
computes  them  to  their  inherent  accuracy.  In  [DK88]  such  an  algorithm  was  provided  for 
computing  the  singular  values  to  high  relative  accuracy.  This  new  algorithm  is  a  hybrid  of 
the  standard,  shifted  QR  algorithm  in  [GK65,BDMS79],  and  a  new,  stable  implementation 
of  QR  with  a  zero  shift.  It  was  also  demonstrated  empirically  that  this  new  algorithm  was 
about  as  fast,  and  often  much  fcister,  than  the  standard  algorithm  [BDMS79],  which  can 
only  provide  absolute  error  bounds  on  the  singular  values. 

In  this  paper  we  will  prove  that  the  algorithm  in  [DK88]  also  computes  the  singular 
vectors  of  bidiagonal  B  with  an  error  bound  depending  on  the  relative  gap  as  in  (B').  More 
precisely,  it  will  compute  singular  vectors  u,  and  v,  with  an  error  bound  q{n)e /relgap,, 
where  q{n)  is  a  modest  function  of  the  dimension  n  of  B,  and  e  is  the  machine  precision 
as  above.  Thus,  this  algorithm  computes  all  features  of  the  SVD  of  a  bidiagonal  matrix 
to  their  inherent  uncertainties.  (.Actually,  we  will  need  to  change  one  line  in  the  algorithm 


of  [DK88],  but  this  change  will  effect  none  of  the  numerical  or  timing  results  reported  in 
[DK88].  We  will  discuss  this  change  further  in  sections  3  and  6.) 

The  proof  that  the  algorithm  computes  singular  vectors  as  accurately  as  claimed  has 
two  new  parts:  bounding  the  error  due  to  the  stopping  criterion,  and  bounding  the  error 
due  to  roundoff  in  the  zero-shift  QR  iteration.  The  stopping  criterion  bounds  are  similar  to 
the  bounds  in  (A')  and  (B'),  and  are  obtained  via  a  specialized  perturbation  argument  in 
which  the  SVD  problem  for  a  bidiagonal  matrix  is  converted  into  an  eigenvalue  problem  for  a 
tridiagonal  matrix  with  zero  diagonal,  following  [GVL83]  (see  section  4).  On  the  other  hand, 
bounds  for  the  roundoff  errors  due  to  repeated  iterations  of  the  algorithm  are  conveniently 
analyzed  in  terms  of  the  long  time  behavior  of  a  Hamiltonian  differential  equation  on  the 
space  of  matrices  naturally  associated  with  the  algorithm  ([DLT89,Sym82,Chu86]). 

To  proceed,  we  need  to  introduce  some  notation.  QR  iteration  with  a  zero  shift  applied 
to  a  genercd  invertible  matrix  Aq  produces  a  sequence  of  orthogonally  similar  matrices  /I, 
as  follows.  Given  A,,  compute  its  QR  decomposition  A,  =  QR,  where  Q  is  orthogonal 
and  R  is  upper  triangular  with  positive  diagonal.  Then  A,+i  =  RQ  =  Q^AiQ.  It  is  well 
known  that  if  Aq  has  eigenvaJues  with  distinct  moduli,  then  A,  converges  to  a  triangular 
matrix  with  the  eigenvalues  on  the  diagonal  as  i  — •  oo.  This  algorithm  may  be  applied  to 
the  bidiagonal  singular  value  problem  as  follows  [GVL83].  Let  Bq  be  our  initial  bidiagonal 
matrix.  Given  B,,  compute  the  QR  decompositions  BiBj  =  Q\R\  and  Bj B,  =  Q2R2- 
Then  let  B,+\  =  QJB,Q2-  Then  Bi  is  bidiagonal  for  all  t  and  converges  as  i  — ►  oo  to  a 
diagonal  matrix  with  the  singular  vaJues  on  the  diagonal.  Observe  that  Bi+\Bj^^  =  RiQ\ 
and  Bj^iBi+i  =  R2Q21  so  that  the  above  zero-shift  SVD  algorithm  implicitly  applies  the 
usual  QR  iteration  to  ByBj  and  BJ B^  simultaneously. 

We  think  of  the  SVD  algorithm  as  a  mapping  from  R^"~*  (the  entries  of  5,)  to  R^"~^ 
(the  entries  of  5,+i).  To  understand  how  errors  propagate  through  iterations  of  the  SVD 
algorithm,  it  is  natural  to  look  at  the  Jacobian  of  this  map,  since  the  Jacobian  describes  how 
small  perturbations  in  5,  affect  J3,+i.  However,  since  we  are  interested  in  the  propagation 
of  re/a<ive  errors,  we  will  look  instead  at  a  Jacobian  which  maps  small  relative  perturbations 
in  B^  to  relative  perturbations  in  5,+i.  To  this  end,  we  will  work  with  the  logarithms  of 
the  entries  5,  and  Bi+\,  since  small  perturbations  in  the  logarithms  of  the  matrix  entries 
are  equivalent  to  small  relative  perturbations  in  the  matrix  entries  themselves.  Thus,  we 
will  think  of  a  bidiagonal  5  as  a  point  in  R^""'  through  the  identification  (recall  that  the 
nontrivial  entries  of  B  are  positive) 

log  61 


Ci        61 

■ 

• 

••       6n-l 

■<;     > 

log6„_i 
logci 

On 

.    loga„ 

and  think  of  one  step  of  the  SVD  algorithm  as  a  map  F  which  maps  vectors  of  logarithms 
of  entries  of  B,  to  vectors  of  logarithms  of  entries  of  5,+i,  i  =  0,1,2 Thus  for  j  >  t 

F<J-')(5.)  =  Fo-yoFjB,)  =  B, 

J  — 1  times 


We  will  call  its  Jacobian  M{j,  t),  which  by  the  chain  rule  is  the  product  of  the  one  step 
Jacobians  M{j,i)  =  M{j,j  -  1)  ■  ■  ■  M{i  +  1,:).  It  is  M{j,i)  which  describes  how  initial 
relative  errors  in  B  and  roundoff  errors  committed  during  prior  SVD  iterations  propagate 
during  later  SVD  iterations. 

The  following  four  facts  were  observed  during  initial  numerical  experiments: 

Fact  1:  The  eigenvalues  of  M{j,i)  appear  in  reciprocal  pairs.    In  other  words,  if  A  is  an 
eigenvalue,  so  is  1/A. 

Fact  2:  Near  convergence  (i.e.  for  i  large  enough),  the  eigenvalues  of  M{i  +  l,i)  are  simple, 
approach  1  and  all  lie  on  the  unit  circle. 

Fact  3:  As  i  — ►  oo,  M(i  +  \,i)  converges  to  the  constant  matrix 

-2      2 


M«,  = 


/n-l 
0 


In 


where  ?„  = 


-2     2 


(1.9) 


independent  of  initial  data. 


Fact  4:  ||M(j,  r)||  grows  linearly  in  the  number  of  SVD  steps;  j  —  i. 


More  precisely,  we  observed  numerically  for  a  large  class  of  problems  that 
||M(;,  i)|loo  ^  506  ■  n  ■  {j  -  i)  (n  is  the  matrix  dimension).  In  section  9,  using  O.D.E. 
methods,  we  will  prove  ||M(;,  i)||g^  <  (8n  -  4)(j  -  i)  +  0(1).  This  is  the  essential  property 
of  roundoff  error  propagation  which  lets  us  prove  that  the  algorithm  computes  singular  vec- 
tors as  accurately  as  claimed.  This  linear  growth  is  to  be  expected  because  near  convergence 
we  have 

Tn-i   U  -  or„ 


M{j,i)^M^'  = 


0 


which  grows  linearly  in  norm.  In  section  10  we  sketch  an  alternate  proof  of  Fact  4  based 
on  this  intuition. 

As  we  will  see.  Facts  1  and  2  follow  from  the  observation  that  linear  combinations  of 
the  variables  log6i, . .  .,loga„  satisfy  a  Hamiltonian  differential  equation  with  Hamiltonian 
-tr(log(55-^)^)/4.  The  relationship  between  the  flow  and  the  algorithm  ■  •  •  -*  B,  — ► 
B,+i  — ►  •••  is  as  follows:  if  log6i(t), . .  .,loga„(<)  solve  the  Hamiltonian  flow  with  initial 
conditions  \ogb\   ,. .  .,logan  ,  then 


log6i(t) 


log  6';' 


where 


.  logan(t)  J  logaj,"' 


,(')     ;,(') 


'1 


b\' 


B,= 


■'n-l 

J') 


gives  the  t-th  step  in  the  SVD  algorithm,  starting  from 


Bo  = 


.(0)      .(0) 


e, 
«<»' 


(see  sections  7  and  8  below).  In  contrast  to  many  eigenvalues  algorithms  (see  [Sym82, 
DLT89]),  where  the  underlying  symplectic  structures  are  Lie-Poisson  structures,  here  the  un- 
derlying structure  is  a  so  called  5Wj/anm  structure  [Sem84].  The  variables  logaj,  logaia2  = 

logci  +loga2,  ...  ,  logoi  •  •   On-i  =  logai  + |-loga„_i  will  turn  out  to  be  the  momentum 

variables  (note  that  loga„  does  not  appear  because  it  is  determined  by  the  other  variables 
through  the  relation  ai  •  •  -a^  =  det(5)  =  constant),  and  the  log 6,  will  be  the  position  vari- 
ables. In  the  limit,  the  momenta  converge  to  constants  (the  sums  of  the  logarithms  of  the 
singular  values),  and  the  positions  move  at  constant  speed  toward  -oo  (i.e.  the  offdiagonals 
bi  decay  to  zero  geometrically). 

Fact  1  wiU  follow  from  the  fact  that  the  Jacobian  with  respect  to  the  canonical  Hamil- 
tonian  variables  is  symplectic;  symplectic  matrices  have  eigenvalues  appearing  in  reciprocal 
pairs.  Facts  2,  3  and  4  will  follow  from  the  asymptotics  of  the  Hamiltonian  system.  Fact  1, 
and  Krein's  theory  of  strongly  stable  symplectic  matrices  [Kre50,Kre55]. 

The  use  of  the  differential  equation  as  outlined  above  suggests  a  paradigm  for  seeking 
algorithms  to  solve  other  eigenvalue  problems  to  high  relative  accuracy.  The  symplectic 
interpretation  of  the  (fortuitously  chosen)  relative  errors  is  that  they  correspond  to  pertur- 
bations in  the  canonical  variables  for  the  symplectic  structure  in  which  the  SVD  cilgorithm 
is  Hamiltonian.  The  general  paradigm  we  suggest  is  the  following:  given  a  Poisson  structure 
in  which  a  given  eigenvalue  algorithm  is  Hamiltonian,  one  should  try  to  construct  natural 
global  canonical  variables  (i.e.  a  global  Darboux  coordinate  system).  Such  variables  would 
indicate  which  functions  of  the  eigenvalues  (in  our  case,  their  logarithms)  are  relatively 
insensitive  to  appropriate  perturbations  in  the  matrix  (in  our  case,  relative  perturbations 
in  the  entries)  and  hence  are  computable  to  high  accuracy. 

The  rest  of  the  paper  is  organized  Jis  follows.  Section  2  proves  the  perturbation  results 
(A')  and  (B').  Section  3  describes  the  algorithm,  and  section  4  bounds  the  error  in  the 
singular  values  and  vectors  introduced  by  its  stopping  criterion.  Section  5  uses  the  bound 
on  ||M(j,  Oil  (Fact  4)  to  prove  the  error  bounds  for  the  zero-shift  QR  algorithm.  Section 
6  proves  global  error  bounds  for  the  singular  vectors  computed  by  the  overall  algorithm. 
Section  7  describes  flows  and  the  SVD  algorithm.  It  also  provides  an  independent  proof 
of  the  convergence  of  the  zero-shift  SVD  aJgorithm  with  detailed  (and  we  believe  new) 
asymptotic  expressions  for  the  matrix  entries.  Section  8  discusses  the  Hamiltonian  structure 
of  the  flow  and  proves  Fact  1  above.  Section  9  analyzes  the  asymptotics  of  ||M(j,  t)||  and 
proves  Fact  4.  Section  10  discusses  the  spectrum  of  the  one-step  Jacobian  of  the  SVD,  and 
proves  Facts  2  and  3.  Section  11  presents  numerical  experiments,  and  section  12  draws  our 
conclusions. 

We  note  that  an  alternative  approach  to  computing  singular  values  using  gradient  flows 
is  presented  in  [Dri87]. 


Caveat:  We  will  abuse  the  word  "algorithm"  in  several  differents  ways.  Sometimes  it 
will  refer  to  one  step  of  the  QR  iteration  (with  or  without  shift)  and  sometimes  it  will  refer 
to  the  full  implementation  with  stopping  criteria  (the  conventional  [BDMS79]  or  the  new 
[DK88]  one).  Which  one  is  meant  will  be  clear  from  context. 


2      Perturbation  Theory  for  Singular  Vectors 


In  this  section  we  prove  the  perturbation  bound  (1.8),  which  says  that  small  relative  per- 
turbations in  the  entries  of  a  bidiagonal  matrix  perturb  the  singular  vectors  by  an  amount 
proportional  to  the  reciprocal  of  the  relative  gap  (1.7).  This  result  was  conjectured  in 
[DK88],  and  a  weaker  result  proven  in  [BD88].  For  the  reader's  convenience  we  also  include 
a  (new)  proof  of  the  eigenvalue  bound  (1.6). 

The  proofs  depend  on  the  following  standard  transformation  [GK6.5].  Suppose  the  bidi- 
agonal matrix  B  has  entries 


/  ^1 


v 


-'2 
S3 


O 


54 


■S2n-3 


o     \ 


•S2n-2 
■S2n-1    / 


(2.1) 


and  SVD  B  =  UEV^,  with  E  =  diag{cTi,.. 
Then  the  symmetric  matrix 


. ^n},  V  =  [vi,...,Vn]  and  U  =  [ui,...,u„]. 


5  = 


5l 

0 

-»2 


O  \ 


S2 
0 


S2n-l 

0  / 


(2.2) 


\    O  •92n-l 

has  eigenvalues  ±a,  with  normalized,  associated  eigenvectors 

hf  =  2"^/^(l',i,±U,l,r.2,±W.2,...,l',n,±",n)^   • 

(Note  for  future  reference  that  the  components  of  h^  are  bounded  by  l/v^.)  Thus,  the 
eigendecomposition  for  S  also  yields  the  SVD  for  B,  and  so  perturbation  theory  for  the 
eigenproblem  for  5  also  computes  perturbation  theory  for  the  SVD  of  B. 

As  described  in  [GvL],  the  transformation  B  >-^  S  should  be  viewed  as  the  result  of 
composing  the  SVD  — ►  eigenproblem  map. 


B^ 


0      5^ 
B    0 


with  a  perfect  shuffle  of  the  rows  and  columns,  {l,2,3,...,2n}--{l,7i+l,2,n-l-2 n,2n}, 

taking 

0      B'^ 

B    0 


^S  . 


Our  first  result  bounds  the  effect  of  infinitesimal  relative  perturbations  in  the  entries  of 
5;  the  second  result  generaJizes  to  finite  perturbations. 

Recall  the  standard  fact  that  the  eigenvalues  of  a  tridiagonaj  matrix  with  nonzero 
off-diagonal  entries  are  simple,  and  hence  that  the  eigenvalues  and  eigenvectors  depend 
smoothly  on  the  entries  of  the  matrix. 


Theorem  2.3  Let  S{t)  be  a  matrix  of  the  form  (2.2),  but  with  entries  5,(t)  which  are 
smooth,  positive  functions  oft.  Let  ±(T,{t)  and  hf{t)  denote  the  eigenvalues  and  eigenvectors 
of  S{t)  respectively.  Then 

I— — I  <  (2n- l)(max|— — I)  (2.4) 

(7,(0)  m        5^(0) 


and 


iLi.nM,   /.o_        ,^/_-..,^m(0),,,  1 


||/tf(0)||<(2n-l)(raaa|-:^|)(— — -).  (2.5) 

m      5m(0)      relgap, 


Proof.    Let 


{M1.M2,M3,M4,---}  =   {+(7i,-<7i,+(T2,-(r2,..  .} 


{Wi,u;2,u;3,u;4,..  .}   =   {/ij",/!!  ,/l2',/i2  '•  •  •) 


(2.6) 


denote  the  eigenvalues  and  associated  eigenvectors  of  S.    Then,  by  regular  perturbation 
theory,  and  repeated  use  of  the  eigenvalue  equation, 

/i,     =     (u,.,5u;.)=   £(^)(u;.(j)3,u;,(j  +  l)) 

2n-l    . 


=     2  ^  ^u',(;)(/i,u;,(;)  -  5j_iu),(;  -  1)) 


2n-l    .  J 


=  2m.  E  ^(E^?(^K-ir'")-  (2. 


j  =  l    •'j     m=l 


Thus 

1^1  <  2(max|^|)'^'|  X:  u,?(m)(-ir-|  . 

On  the  other  hand,it  follows  from  (2.3)  and  the  perpendicularity  of  /i^  and  h~ ,  that 

f^  u,2(2m)  =  X:  '^'(2m  -  1)  =  ^(  E  ^?(m))  =  ^  .  (2.8) 

m=l  m=l  m=l 

But  then 

|^u;2(m)(-l)^— |<i,  (2.9) 

m=l  '^ 

which  proves  (2.4). 

Again  by  regular  perturbation  theory  and  repeated  use  of  the  eigenvalue  equation, 


w,     =     E(u'k,5u;,] 


Wife 


fc#.  ^'  -  ^*^ 
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2n-l      •  Uk 

m=l      '^"'     Jt#i 
2n-l      • 

=      13  (  — )13(wfc(m)u>,(m)(/i, +  /ifc)-'-'.("^-  l)5,T.-iWife(m) 


m=l      ■^'^     *:#. 


-u;i(Tn  -   l)5m-l<-^.("^)) 


Mt  -  M* 


2n-l 


•Sm  ^  V^  ,M.  +  Mfcs/ V-     ./^A./z;\/       i  \m-^ 


^— '       c  _      *— '     /;.  —  III.       "-^ 


m^i  ^-  ;:7.  ^'"^^  ^=1 


Using  the  orthnormality  of  the  eigenvectors  Ujt,  we  obtain 

2n— 1  I  "I 


=     ( majc| 


S.  2n-l 


■^^        m  =  l      Jk5i 

But  again,  by  orthonormality, 

Jt=l      /=1  l</,(!<m     fc=l 

=     f:u;?(£)<l. 

A  simple  computation  shows  that  if  /i,  =  ±a^i,  then  max^^,  l^^r^l  =  (rel  gap,,)~^.   This 
proves  (2.5),  and  the  Theorem.       D 

Remark.  From  (2.3),  ||Af  ||2  =  i(||u.||2  +  ||u.||2).  Hence  (2.5)  yields 

max(||u.ll,||i.||)  <  V2(2n-  l)(max|^|)(— ^)  .  (2.11) 

We  also  have  the  following  global  error  bound. 
Theorem  2.12   Let  B  and  B'  be  bidiagonal  matrices  with  positive  entries  B,ic  and  B[j^.  Let 

B' 

r?,  =  (2n-l)max|log--ii^|  (2.13) 

J.*  iijk 

be  the  relative  error  in  B  as  defined  in  (1.5),  and  77^  =  e'''  -  1  >  ry^  >  0.   Then 

_/ 

1  -  r?;  <  e""'  <  -i  <  e"'  =  1  +  r?;  (2.14) 

11 


which  implies 

-V'r<- -<V'r-  (2.15) 

Furthermore,  if  t)'^  <  relgap,,  the  sines  of  the  angles  9{u,,u[),9{v,,vl)  between  the  unper- 
turbed singular  vectors  u,,v,  and  the  perturbed  singular  vectors  u[,v[,  are  bounded  by 

m&x{s\ne{u„u[),sin9{v„v:))  <       ,"^^^^^"^9  (216) 

relgap,  -  tj^ 

Proof.  Let  S  and  S'  be  matrices  of  the  form  (2.2)  derived  from  B  and  B'  as  before. 
Set  5(()  =  5  +  t{S'  -  5),  0  <  <  <  1.  With  the  notation  of  Theorem  2.3,  we  have  from  (2.7) 
and  (2.9), 


/i,  Jo    at 

2n-l         -1      . 

fr{    Jo    s, 


2n-l  / 

=   Eiiog^i     . 

<       Ir    , 

which  proves  (2.14).  Note  that  the  eigenvalues  of  s(t)  cannot  pjiss  through  zero  as  t  varies 
from  zero  to  one. 

In  a  similar  way,from  (2.10)  and  the  calculation  that  follows  in  Theorem  2.3, 

,M.(0  +  Mfc(0|/^\im(0, 

w,     <  (  max  max  — ■ —  )   >       , 

"    '"-^<r<i  M.  V,(0-Mfc(<)'^;^i    5„(0'  ' 

which  yields 

\\w,  -  w,\\  <  (  max  rnax  I— — —     ry,  , 

0<(<1    fc#i      fi,{t)  -  ^lk[t) 

and  hence 

lPfy-.,*ll<(.^,.axi^;ill±£iili|,,., 

as  noted  at  the  end  of  the  proof  of  Theorem  2.3. 
Now  observe  that 

T)r{t)  =  (2n  -  1)  max  I  log  ^|  <  r/, 
for  all  0  <  <  <  1.  Thus  from  (2.14) 

Suppose  Otit)  >  (7/t(<)  >  0.  Then 

<T.(0  +  <7fc(0  <e'"(c^,(0)  +  ai(0)). 
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and 


so  that  for  tj'^  <  relgap,. 


,cT,(0  +  ajt(0,  ^         e"'         _       i  +  T?; 


a,(t)-ak(t)        relgap,-r]'^       relgap,  -  t)'^ 
The  same  inequality  holds  if  ak(t)  >  a,(t),  and  we  obtain 


,±^/         .,,±.,1/       VAl  +  V'r) 


\\{hfy-{hf)\\< 


relgap,  -  r?; 
As 


sme{u,,u[)<  \\u',-u,\\  <  v/2||(/i+)'-/i+|| 
sin<?(t;.,r:)  <  ||i/  -  v,\\  <  V2\\{h^)'  -  k^\\  , 


and  as  r?r  <  77^,  the  result  follows.       D 
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3     The  SVD  Algorithm 

In  this  section  we  describe  the  algorithm  for  the  bidiagonal  SVD.  This  algorithm  was  in- 
troduced in  [DK88],  and  discussed  in  detail  there.  Here,  we  present  a  simplified  version  of 
the  aJgorithm  which  is  adequate  to  prove  our  error  bounds.  The  practical  enhancements  we 
omit  here  can  greatly  improve  performance  in  some  cases,  but  will  not  invalidate  our  error 
analysis.  It  turns  out  our  eventual  error  bounds  will  depend  on  the  number  of  QR  steps 
necessary  for  convergence;  the  practical  enhancements  often  reduce  this  number  dramati- 
cally, and  we  summarize  our  numerical  experience  with  the  number  of  QR  steps  required 
in  section  11. 

Briefly,  the  algorithm  is  a  hybrid  of  the  standard  shifted  QR  algorithm  and  the  implicit 
zero-shift  QR  algorithm.  The  standard  shifted  QR  is  used  on  matrices  which  are  well- 
conditioned  {cTn  is  not  much  smaller  than  cr\),  and  implicit  zero-shift  QR  is  used  on  ill- 
conditioned  submatrices  (ct„  <  <T\).  Implicit  zero-shift  QR  is  much  more  accurate  than 
shifted  QR,  but  much  slower  if  the  matrix  is  well-conditioned.  Fortunately,  shifted  QR 
is  adequately  accurate  on  well-conditioned  matrices,  so  we  only  need  to  exploit  the  more 
accurate  implicit  zero-shift  QR  when  it  is  also  fast.  Thus,  as  the  algorithm  runs,  the 
offdiagonal  entries  decrease  and  are  eventually  set  to  zero,  deflating  the  matrix.  On  each 
newly  deflated  submatrix,  <Tn  and  <7i  are  cheaply  but  reliably  estimated  and  either  shifted 
QR  or  implicit  zero-shift  QR  used  depending  on  the  ratio  (7„/cri. 

We  will  need  to  change  one  line  in  the  algorithm  presented  in  [DK88]  in  order  to  prove 
our  error  bounds  in  section  6.  This  change  will  not  alter  any  of  the  numerical  or  timing 
results  reported  in  [DK88]. 

We  will  present  the  implicit  zero-shift  QR  algorithm,  the  stopping  criterion  for  setting 
tiny  offdiagonal  entries  to  zero,  and  finally  the  overall  algorithm.  Implicit  zero-shift  QR  calls 
the  subroutine  ROT{f,g,cs,sn,r),  which  takes  /  and  g  as  inputs  and  returns  r,  cs  =  cos^ 
and  571  =  sin^  such  that 


cs 

-sn 


sn 
cs 


■/■ 

9 

= 

r 
0 

(3.1) 


ROT(f, g,cs,sn,r):  take  /  and  g  as  input  and  returns  cs,  sn  and  r  satisfying  (3.1). 


if       (/  =  0)  then 

cs  =  0;  5n  =  1;  r  =  g; 
elseif  (I/I  >  1^1)  then 

t  =  g/f;  it  =  VI +  t^ 

cs  =  1/tt;  sn  =  t  *  cs\  r  =  f  *  tt 
else 


i  =  f/g-  tt  =  vTTT2 

sn  —  l/tt;  cs  =  t  *  sn;  r  =  g  *  tt 


endif 
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The  algorithm  also  calls  subroutine  UPDATE{c$,sn,vx,V2): 
UPDATE(cs,sn,vi,V2):  replace  n- vectors  Uj  and  V2  by  cs-vi+sn-V2  and  -8n-v\  +  csv2. 


for     t  =  1  to  n 

t;i(t)  =  cs  *t  +  sn  *  V2{i) 

t>2(z)  =   -371  *t  +  CS*  V2(i) 

endfor 

Implicit  Zero-Shift  QR  Algorithm:  Let  5  be  an  n  by  n  bidiagonal  matrix  with  diagonal 
entries  ai, . .  .,an  and  superdiagonal  entries  6i, . .  .,6„_i.  The  following  algorithm  replaces 
Ci  and  6,  by  new  values  corresponding  to  one  step  of  the  QR  iteration  with  zero  shift. 
It  also  updates  the  right  unit  singular  vectors  v,  ajid  left  unit  singular  vectors  u,  (at  the 
start  of  the  algorithm,  these  should  be  initialized  to  the  columns  of  the  identity  matrix: 
f.O)  =  ".(i)  =  <5.j- 

oldcs  =  1 

f  =  a\ 
9  =  bi 

for  i  =  1,  n  -  1 

call  ROT{f,g,cs,sn,r) 

call  UDPATE{cs,sn,v,,v,+i) 

if  (i  ^  1),  6,_i  =  oldsn  *  r 

/  =  oldcs  *  r 

g  =  a,+i  ♦  sn 

h  =  a,+i  ♦  cs 

call  ROT{f,g,cs,sn,r) 

call  UPDATE{cs,  sn,  u,,  u,+i ) 

a,  =  r 

/  =  /i 

5  =  ''.+1 

oldcs  =  C5 

oldsn  =■  sn 
endfor 
6n-i  =  h  *  sn 

On  =  h  *  cs 
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This  algorithm  may  also  be  expressed  in  the  following  terser  but  equivalent  form  (we 
will  need  the  expanded  form  above  for  the  analysis): 

oldcs  =  1 

cs  =  I 

for  i  =  1 ,  n  —  1 

call  ROT{a^  *  cs,  bi,  cs,  sn,  r) 

call  U DPATE{cs,sn,v,,v,+i) 

if  (i  ^  1),  6,_i  =  oldsn  *  r 

call  ROT{oldcs  *  r,  a,+i  *  sn,  oldcs,  oldsn,  a,) 

call  U PDATE{cs,sn,u,,u,+i) 
endfor 
h  =  On  *  cs 
6„_i  =  h  *  oldsn 
On  =  h  *  oldcs 

Next  we  discuss  the  stopping  criterion,  i.e.  how  to  decide  when  to  set  an  offdiagonal  6, 
to  zero  and  so  converge.  It  is  important  that  the  stopping  criterion  introduce  error  in  the 
singular  vectors  not  much  worse  than  the  bound  of  Theorem  2.2,  since  otherwise  we  will  not 
compute  the  singular  vectors  to  their  inherent  accuracy.  The  criterion  described  in  [DK88] 
is  the  following. 

Stopping  Criterion:  This  algorithm  decides  when  an  offdiagonal  entry  6,  can  be  set  to 
zero.  0  <  <o/  <  1  is  a  relative  error  tolerance. 

Ml  =  ai 

for  J  =  1 ,  n  -  1 

if  |6j|  <  tol  ♦  fij,  set  bj  =  0 
/ij+i  =  \aj+i\*(fij/{fij  +  |6_,|)) 
endfor 

It  was  shown  in  [DK88]  that  this  criterion  perturbs  the  singular  values  of  B  by  no  more 
than  about  n  ■  tol/2^^'^  when  tol  <  1.  In  sections  4,  5  and  6  we  will  show  that  it  also  hais  a 
small  effect  on  the  singular  vectors,  in  particular  that  it  does  not  change  them  in  direction 
by  more  than  about  0{tol/relgapt),  which  is  the  same  level  as  their  inherent  uncertainty. 
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Thus,  the  overall  algorithm  can  be  summarized  as  follows. 
Bidiagonal  SVD  Algorithm  (simplified) 
Loop: 

Find  the  bottommost  unreduced  submatrix  of  B\  call  it  B. 

(Let  s  and  e  be  the  starting  and  ending  indices  of  B  within  B. 

Then  6^  =  0  if  e  <  n,  6,_i  =  0  if  5  >  1  and  6,  51^  0  for  5  <  i  <  e  -  1.) 
If  5  is  1  by  1  (5  =  e),  we  are  done. 

Apply  the  stopping  criterion  to  B;  if  any  b,  are  set  to  0,  return  to  Loop 

Estimate  the  smallest  singular  value  a  and  the  largest  singular  value  a  of  B. 

if  n  ♦  £/a  <  max(£/<o/,  .01)  then 

Use  implicit  zero-shift  QR  on  B 
else 

Use  standard  shifted  QR  on  B 
endif 

Goto  Loop 

The  difference  between  this  algorithm  and  the  one  in  [DK88]  (besides  some  insignificant 
simplifications)  is  the  use  of  the  test  {n*a/a  <  rmix{e/tol,  .01))  in  place  oi  {n*a/a  <  e/tol) 
to  determine  whether  or  to  use  zero-shift  QR  or  shifted  QR.  The  reason  for  this  change  will 
become  apparent  in  section  6.  In  [DK88]  the  value  of  to!  used  in  the  numerical  tests  was 
tol  =  100  ♦  £,  so  this  change  has  no  effect  on  the  results  reported  there.  The  value  100  was 
chosen  empirically  to  make  the  algorithm  fast,  but  could  easily  be  made  as  large  cls  1000  or 
as  small  as  10  without  greatly  impacting  performance. 

We  summarize  here  the  ways  in  which  the  above  description  simplifies  the  actual  jJgo- 
rithm  of  [DK88],  argue  that  we  have  not  omitted  any  features  which  could  greatly  increase 
the  error,  and  summarize  the  properties  of  the  practical  algorithm  we  do  need  for  the  error 
analysis. 

Bidirectional  QR:  If  5  is  as  given  in  (1.1),  we  define  rev{B)  as  the  matrix 


reviB)  = 


(3.2) 


i.e.  B  with  the  diagonals  reversed.  The  SVDs  of  B  and  Tev{B)  are  simply  related  since 
B  =  PB^ P^ ,  where  P  is  the  permutation  matrix  with  ones  going  from  the  bottom  left  to 
the  upper  right:  B  =  UT.V'^  implies  r€v{B)  =  {PV)I,(PUf.  It  turns  out  that  QR  iteration 
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may  converge  much  faster  applied  to  fev{B)  than  B,  and  so  the  practical  algorithm  tries 
to  exploit  this  and  perform  whichever  one  is  faster.  The  reason  for  the  speed  difference 
is  as  follows.  As  zero-shift  QR  converges,  the  singular  values  appear  on  the  diagonzJ  in 
decreasing  order  from  upper  left  to  lower  right.  If  the  entries  of  B  are  already  graded  in 
this  way,  the  algorithm  will  converge  more  quickly  than  if  they  are  not.  The  algorithm  tests 
for  this  grading  in  a  very  simple  way:  if  lai|  >  |a„|,  QR  is  applied  to  B,  and  otherwise  to 
rev{B).  This  is  not  a  foolproof  scheme,  but  can  quadruple  the  speed  for  strongly  graded 
matrices. 

Since  the  S\'Ds  of  B  and  rev(B)  are  essentially  permutations  of  one  another,  it  suffices 
to  perform  an  error  analysis  either  of  QR  applied  to  B  or  QR  applied  to  rev{B).  Our  error 
bound  will  however  depend  on  the  totaJ  number  of  QR  steps  taken,  and  so  benefit  from  the 
practical  enhancements;  we  summarize  our  numericaJ  experience  with  the  number  of  QR 
steps  required  (or  convergence  in  the  section  11.  Of  course,  in  practice  the  number  of  QR 
steps  is  known  after  the  algorithm  terminates,  and  this  vaJue  could  be  used  in  our  later 
error  bounds. 

Bidirectional  stopping  criterion:  Just  as  QR  can  be  applied  either  to  B  or  rev{B),  so  can 
the  stopping  criterion.  In  fact  we  apply  it  to  both  B  and  rev{B),  no  matter  to  which  of  the 
two  we  apply  QR. 

2  by  2  submatrices:  \\'hen  the  bottommost  unreduced  submatrix  5  is  2  by  2,  we  can 
apply  the  quadratic  formula  to  directly  compute  its  SV'D.  In  practice,  we  implement  it 
quite  carefully  so  that  when  addition  and  subtraction  are  implemented  with  a  guard  digit 
{fl(a  ±  b)  =  (a  ±  b)(l  +  £i),  l^il  <  e)  we  compute  both  the  singular  values  and  singular 
vectors  to  nearly  full  machine  precision,  even  if  the  relative  gap  is  sfnaJl  so  that  the  singular 
vectors  are  ill-conditioned.  In  the  absence  of  a  guard  digit  (the  model  of  arithmetic  in  (5.1)) 
we  only  compute  nearly  the  exact  singular  values  and  vectors  of  a  matrix  which  differs  from 
the  input  in  a  few  bits  in  each  entry.  Thus,  the  global  perturbation  bounds  of  section  2  are 
respected. 

Deflation  when  a,  =  0;  When  some  a,  =  0,  then  the  code  will  automatically  set  6,_i,  6n-i 
and  a„  to  zero;  this  is  because  /  =  /i  =  0  at  the  end  of  each  loop  iteration.  Thus,  this  case 
needs  no  special  consideration  (in  contrast  to  the  standard  SVD  [BDMS79]  which  treats 
this  case  specially). 

Choosing  implicit  zero-shift  QR  versus  standard  shifted  QR:  We  need  to  compute  cheap 
and  reliable  estimates  of  the  smallest  singular  value  a  and  largest  singular  value  a  in  order 
to  decide  whether  to  use  standard  or  implicit  zero-shift  QR.  The  largest  singular  value  a 
can  clearly  be  estimated  to  within  a  factor  of  2  by  the  largest  absolute  entry  of  B.  The 
smallest  singular  value  a  turns  out  to  be  estimated  to  within  a  factor  of  n*'/^  by  min^  ^jt, 
where  the  fik  are  computed  by  the  recurrence  in  the  Stopping  Criterion  above  [DK88].  This 
uncertainty  in  ff  a-nd  a  will  contribute  a  factor  of  n'/^  to  our  final  error  bound. 
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4     Error  Bounds  for  the  Stopping  Criterion 

Let  5  be  a  bidiagonal  matrix  with  positive  entries  Cp,  6,,  and  let  B'  be  the  bidiagonal  matrix 
obtained  from  B  by  setting  6j  =  0  for  some  j,l  <  j  <n-\.  Let  5  and  S'  be  the  associated 
tridiagonal  matrices  of  form  (2.2). 
Define  vectors  Z2j,Z2j+\  through 

Sz2:  =  ejj  (4.1) 

Sz2j+i  =  e2;+i  ,  (4.2) 

where  {e,}  give  the  standard  basis  in  1?".  A  simple  calculation,  using  the  explicit  form  of 
5,  shows  that 

Z2j(i)  =  0      if      ^  is  even  or  ^  >  2j  +  1  (4.3) 

22j+i(^)  =  0  if  ns  odd       or  f  <  2j  (4.4) 

Set 

mi  =  min(52j||22ji|i,  S2j||22j+i||i) 
m'l  =  e""  -  1  >  mi 

mj  =  min(52jlk2;il2   ,    52j||^2j  +  l||2) 

m2  =  e'"'  -  1  >  mj  , 
where  ||  •  ||j,  ||  •  ||2  denote  the  L^  eind  L^  norms  respectively.  It  is  easy  to  see  that  for  a  given 

mi  =  bjHj  (4.5) 

where  ^j  appears  in  the  Stopping  Criterion  of  §3.  Similarly, 

m2  =  6j77j  (4.6) 

corresponds  to  /ij,  but  for  rev  (5)  as  in  (3.2).    (See  the  discussion  of  the  bidirectional 
stopping  criterion  in  §3.) 

Theorem  4.1   Let  B,  B'  etc.  he  as  above.  Let  <7,,a[  be  the  singular  values  of  B,  B'  respec- 
tively, and  let  u,,i»,  andu[,vl  be  the  associated  singular  vectors,  making  angles  6{u,,  u[),&{v,,  u,') 
respectively.  Then  for  I  <  i  <  n, 

_/ 
l-m'f<  e-""  <  —  <  e"'  =  1  +  m'^  ,  (4.7)^ 

where  t  =  1,2. 

Furthermore,  if  m\  <  relgap,, 


m^xism9iu„u[),sin9{v„v',))  <  m',{-^{l  +  ^fT[)  +  ( _i-tl^ )  ^IHI)  ,  (4.8), 

and  if  m^  <  relgap, 

max(sin5(u„u:),sin5(r„v,'))  <  m'2(l  +  _ii±I!^).  (4.8)2 

relgap,  —  m^ 
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Proof.    Let  S{t)  be  the  matrix  obtained  from  S  by  replacing  S2j  by  ts2j,  0  <  /  <  1. 
Note  that  S(l)  =  5,  S(0)  =  S'.  By  (4.3)  and  (4.4),  note  that 

5(0^2;  =  Sz2j  =  €2] 
S{t)Z2j  +  l   =  Sz2j  +  l   =  e2j 

for  all  t  in  the  interval. 

With  the  notation  of  Theorem  2.3, 

/i,(t)     =     25,j(e2j,u;,(t))(e2j+i,u',(<)) 

=     23,.,{S{t)z2j,w,{t))  {e2j+i,  w,(t)) 

=      25,j  ^l,   (Z2j,W,{t))  {e2j  +  u  w,{t))  . 

But  |(e2;+i,u;,(0)|  =  2-1/2  and  |(z2;,u;,(0)|  <  ||z2;|ii||ti;.(0l|oo  <  \\z2j\U2-'/^.  On  the  other 
hand  from  (4.3), 


\{z2j,w,it))\     <     2-i/2||z2j||2lltv(0||2  ,  for  suitable  r', 
=     2-i/2||z2,||2  . 


[4.9) 


Th 


us 


d]og^,{t) 


dt 


<  S2j\\z2]\\i  =  rn(  ,         ^=1,2 


and  integration  gives  the  desired  eigenvalue  bound  (4.7)^,  i  =  1,2. 
We  now  prove  (4.8)2-  Perturbation  theory  gives 


Wk 


dt 


k:^, 


fit    -   f^k 


Wk 


=      •52;  XI  ((^2j'  U'.)  {e2j  +  l,Wk)  +  {e2j,  U;k)(e2j  +  l,  w,))- 


Wk 


=       •S2jXI(^'(^2j,U'i)(e2j  +  l,'^Jfe)  +  ^fc(22j,1i^i)(e2;  +  l,U;,)) 

_     £2j^y 

2  trA^f^'-i^k 


as  S{t)z2j  =  e2j  , 


22;,t^.)(e2j  +  l,"^fc)+   ( —  ]{z2j.UJk)ie2j  +  l,W,) 


/^.   -  M*: 


U'fc 


iZ2j,y}t)ie2j  +  l,Wk)  -  {z2j,Wk)(e2]  +  l,Wt) 


Wk  . 


The  second  sum  is  bounded  in  norm  by 

l(^2;,«^.)l  I|c2j+i||2  +  l(e2j+i,u;.)|  Ik2j||2  <  n/2||22;||2  ,  bv  (4.9). 
The  first  sum  is  bounded  in  norm  by 


max 


M.  +  f^k 


Mt  -  ^Jt 


*•  k=i 


iZ2},Wif{e2,  +  l,'Wk?  +  iz2j,Wkf{e2j  +  uW,)'' 

1/2 


+  2(  22;  ,  U^,)(«2j  +  1 ,  U'.)(e2;  +  1  vU^Jt  )(  22j  .  U-'*  : 
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=     max 


M.  +  f^k 


M.  -f^k 


|li£|M  +  M|l  +  2  .  (2-^/^||.,,||,)  ■  2-^/^  .  \\z,,h] 


1/2 


Combining  terms, 


l^^ll^<M^(^^^^|..(0  +  ..(0, 


dt 


v/2 


i#.     ^i,{t)-Hk[t] 


A  similar  computation  shows  that  the  same  inequality  holds  with  ||2'2j||2  replaced  by  ||22j+i||2- 
Furthermore,  note  that  the  proof  of  (4.7);  shows  that 

\-m',<  e-""  <  e-t^-"""  <  ^^  <  e''"'*'"'  <  e'"'  =  1  +  m' 

and  hence,  arguing  as  in  the  proof  of  Theorem  2.11, 

ll^^^(0|l    ^  m;  1  +  m', 

"      dt      "^  -  72^         re/^ap,  -  m'2  ^  ' 

which  yields  (4.8)<,  upon  integration. 

The  inequality  (4. 8)1  is  proved  in  the  same  way.  Factors  of  order  s/n  appear,  for  example, 
in  estimating  the  term 

E(^2;,u;,)2<(2n-l)(2-i/2||^2,||,  . 


Remark.  The  inequ2dity  (4.7)i  was  proved  in  [DK88]  by  a  different  method. 
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5     Error  Bounds  for  the  Implicit  Zero-Shift  QR  Algorithm 

In  this  section  we  will  derive  error  bounds  for  the  quantities  computed  by  m  steps  of  the 
implicit  zero-shift  QR  aJgorithm  of  section  3.  We  will  use  the  bound  on  ||M(t,  j)||  to  be 
derived  in  section  9  to  bound  the  round  off  error  propagated  from  step  to  step.  Our  main 
results  will  be  Lemma  5.11,  which  bounds  the  relative  errors  in  the  bidiagonal  matrix 
entries  after  m  zero-shift  QR  steps,  and  Lemma  5.14  which  bounds  the  absolute  error  in 
the  computed  orthogonal  matrix  of  the  m  zero-shift  QR  steps. 

In  the  error  analysis,  we  will  use  the  fact  that  an  absolute  perturbation  e  in  logi  is  to 
first  order  equivalent  to  a  relative  perturbation  e  in  i: 

logx(l  -I-  f)  =  logx  -I-  log(l  -[-  f)  =  logx  -I-  e 

Therefore,  the  Jacobian  map  M{j,i)  which  propagates  absolute  perturbations  in  the  log- 
arithms of  entries  of  B,  to  Bj  also  propagates  relative  perturbations  of  entries  of  B,  to 
B,. 

As  is  traditional  in  numericaJ  analysis,  we  will  bound  quantities  of  the  form  n(  1  +  fi )  by 
instead  bounding  5  =  J^  |f,|.  When  5  <C  1  (the  case  of  interest),  we  then  have  approximately 
that  I  -  3  <  n(l  +  ft)  <  1  +  •«.  If  more  rigor  is  desired,  we  can  use  the  fact  that  for  all 
s  <  1  we  have  I-  s  <  n(l  +  ft)  <  e'. 

Our  model  of  arithmetic  is  a  variation  on  the  standard  one:  the  floating  point  result 
//(•)  of  the  operation  (•)  is  given  by 


//(a  ±6)    =     fl(l  +  £i)±6(l  +  £2) 

//(ax  6)     =     (ax6)(l  +  f3)  (5.1) 

//(a/6)    =     ia/b){]+e,) 

fl(V^)       =       v^(l+£5) 

where  \e,\  <  £,  and  £  <  1  is  the  machine  precision.  This  is  somewhai  more  general  than 
the  usual  model  which  uses  fl(a  ±b)  -  (a  ±  6)(  1  -I-  £1 )  and  includes  machines  like  the  Cray 
which  do  not  have  a  guard  digit.  We  do  not  consider  over/underflow;  methods  for  extending 
error  analysis  to  include  underflow  are  presented  in  [Dem84]. 

Our  analysis  proceeds  by  five  lemmas.  Lemma  5.2  [DK88]  analyzes  the  roundoff  errors 
in  the  subroutine  ROT  of  section  3.  Lemma  5.4  uses  Lemma  5.2  to  bound  the  errors  in 
the  bidiagonal  matrix  B  after  one  step  of  the  implicit  zero-shift  QR  algorithm.  Lemma 
5.11  uses  Lemma  5.4  and  the  bound  on  ||M(j,  i)||  of  section  9  to  bound  the  errors  in  the 
bidiagonal  matrix  after  m  steps  of  the  implicit  zero-shift  QR  algorithm.  Lemma  5.13  shows 
that  small  errors  in  the  sines  and  cosines  computed  by  one  step  of  the  aJgorithm  only  cause 
smatll  errors  in  the  computed  orthogonal  matrices  containing  the  singular  vectors.  Finally 
Lemma  5.14  bounds  the  absolute  error  in  the  computed  orthogonal  matrices  containing  the 
singular  vectors  after  m  steps  of  the  algorithm. 

Lemma  5.2  Let  cos  ^,  sin^  and  p  denote  the  exact  outputs  of  ROT  for  inputs  f  and  g  and 
exact  arithmetic.  Let  cs  =  {I  +  (a)  cos 9,  sn  =  {\  +  f,„)sin^  and  r  =  (l  ■{■  (r)p  denote  the 
floating  point  results  of  ROT  applied  to  the  perturbed  inputs  f  =  {\  +(j)f  and  g  =  (\  +  (g)g. 
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Then  to  first  order  we  may  bound  the  relative  errors  e^,  €,n  and  (r  in  terms  of  if,  eg  and 
the  machine  precision  £  as  follows: 

kcl     <     {\(f\  +  \(g\)s\n^9+  — 

|€,n|       <       (k/|  +  k,|)cOs2^+^  (5  3^ 

,      ,  ,       ,  2  /,        ,       I     •     2  /I         13£ 

Kr|      <       |€/|  COS^  +  |fp|  Sin    6 -\ 

4 

Proof.  See  the  proof  of  Lemma  5  in  [DK88].  The  slightly  different  model  fl(a  ±b)  = 
(a ±b){l  +  £i)  used  in  [DK88]  does  not  affect  the  final  result  because  only  positive  quantities 
are  added  in  ROT;  this  makes  the  two  models  equivalent.  D 

To  state  the  next  lemma,  we  need  to  be  able  to  distinguish  the  different  values  the 
variables  in  the  implicit  zero-shift  QR  algorithm  take  on  at  different  times.  To  this  end,  we 
state  the  following  equivalent  algorithm,  where  the  variables  are  labeled  by  the  loop  counter 
t: 
Labeled  Implicit  Zero-Shift  QR  Algorithm: 

oldcs\  =  1 

/i  =  0.1 

g\  =  h 

for  I  =  1,  71  -  1 

call  ROT{ft,g„cs^i,sna,r,i) 

call  U DPATE{cs,i,sn,i,v,,v,+i) 

if  (t  ^  1),  6j_i  =  oldsUt  *  r,i 

fii  =  oldcs,  *  r,i 

9x1  =  flt+i  ♦  snu 

call  i20T(/,i,5,i,c5,2,5Ti,2,r,2) 
call  UPDATE{cs,2,sn,2,u„u,+i) 
a,  =  r,2 
/i+i  =  h, 
ffi+i  =  ^+1 
oldcsi+i  =  C,S,2 
oldsn,+i  =  5n,2 
endfor 

in-l   =  /in-l  *  Snn-1^2 
On  =  /l„_l  *C5„_i,2 

Let  the  n  by  n  bidiagonal  matrix  B  have  diagonal  entries  a,  and  offdiagonal  entries  b,. 
Let  the  matrix  B'  with  entries  a[  and  b',  be  the  the  result  of  applying  the  implicit  zero-shift 
QR  algorithm  to  B  once  in  exact  arithmetic,  and  let  the  variables  in  the  labeled  implicit 
zero-shift  QR  algorithm  above  denote  the  corresponding  intermediate  values. 

Now  let  B  be  the  slightly  perturbed  matrix  with  entries  d,  =  a,(l  -|-  €„,)  and  b,  = 
6,(1  +  €6.),  and  let  B'  (with  entries  a[  =  a',{l  +  €<.- )  and  b',  =  b',{l  +  e^.))  be  the  result  of 
applying  the  implicit  zero-shift  QR  algorithm  in  floating  point  arithmetic  to  B.  Let  hatted 
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variable  denote  the  corresponding  intermediate  floating  point  variables  (e.g.  /,  =  /i(l  +  f/,)). 
We  wish  to  bound  the  final  relative  errors  e^i  and  tj,-  in  the  entries  of  B'  and  the  relative 
errors  fci.j  f»h,i,  fcjij?  and  f»h,j  in  the  sines  and  cosines  in  terms  of  the  initial  relative  errors 
€ai  and  f6,  and  the  machine  precision  e. 

Lemma  5.4  In  terms  of  the  notation  just  introduced, 

max(|€„,|,|€y|)  <  6(2n  -  1)  max(|fa.i,  Ifi.i)  +  (-i7n  -  27)£  (5.5) 

and 

fc.,.n  =  max(kc*.,  Mf»>ml'|fc-..2Mfj>..2l)  <  5(2n  -  1)  max(|fa.|.  |ffc,l)  +  (41n  -  66)£     (5.6) 
t  t 

Remark  5.7  Lemma  5.4  is  a  variation  on  Lemma  7  in  [DK88]. 

Proof.  We  begin  systematically  applying  (5.1)  and  Lemma  5.2  to  all  the  operations 
in  the  labeled  implicit  zero-shift  QR  algorithm  in  order  to  derive  a  recurrence  relation  for 
|€/,|  and  \€Mc,,\-  Initially 

/i  = /i(l  +  e/,)  =  ai(l  +  €aj    and     pi  =  5i(l  +  e^J  =  6i(l  +  €6, ) 

so  that  e/j  =  fo,  amd  fjj  =  cj, .  At  the  top  of  the  loop  we  always  have  p,  =  6;  and  so 
(g^  =  fj,^ .  After  the  first  call  to  ROT  we  have 

C5,i  =  C5,i(l -|-€c*., )       where        kc».i  I  ^  (k/.l  +  kb.l)'Sn?i  +  ^ 

2l£ 

sn,i  =  sn,i{l  +  e,n„)       where        U,n,i\  <  {Uf,\  +  \(b,\)cs^\  +  — 
fii  =  rii(l  +  er,^)       where        l^rj  <  \€f,\cs]^  +  \eb,\sn'^i  + — 

Next,  we  get 

/ii  =  /.i(l  +  f/.i)       where        |€/.,  |  <  \eoidc,.\  +  \(rj  +  c  <  \(otdc3,\  +  |f/.l«fi  +  \€b,\sn]i  + 

5,1  =ff,i(l  +  fs„)       where        |ej.,|  <  |e<,.^J  +  |f,„.J  +  £  <  |ea.+,|  +  (k/.l  +  Ub.Dcs^i  +  — 

25f 
A,  =  /i,(l  +  eA,)       where        |c/,.|  <  |€a.+,  I  +  kc*,,!  +  «^  <  i^a.+J  +  (|f/.l  +  kfc.l)5"a  +  — 

After  the  second  call  to  ROT  we  have 

C3i2  =  cs,2{l  +  (ct.i)       where     |ec*.j|<  {|fo/Jc*.|  +  !«».+,  I  +  2|€y.|c5?,  +  |€6.|).sn?2  +  — 

«r»,-2  =  571,2(1 -1-e.n.J       where    |e,„.3|<  (ko/dc^.i  +  ka.+i  I  +  2|f/,|c5?i  +  ki.Dcsfj  + — 

r,2  =  '•,2(  1  +  f r,2 )       where     \er,^  \   <  ko/dc5.  k-s?2  +  ka,+i  knfj  +  k/.  k^fi 

11/2  2  2         2   \  00£ 

+  k6.|(C5,2Sn,i  +5nf2C5fi)+  — - 

4 
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Since  o/dcs,+i  =  C5.2  and  /,+i  =  h„  we  have  ioidc,,+r  =  ^€,,2  and  f/,+i  -  f/,,,  so 


< 

2c5.Vn?2 

0 

3Tl?2 

+ 

5n?i        1 
571^2     sn]^ 

k6.l 

+ 

25 
63 

£ 

4 

We  may  write  this  as 

£i+i  <  A.  •  £.  +  F.  ^  G. 

where 

k/.l 

,  -F.  = 

'  1  " 
1 

(k6.l  +  |fa.,J)    and      G.  = 

'  25  ' 
63 

£■ 
4 

This  implies 

5n?i      0 
2c5?,     1 


Since  componentwise 
we  finally  get  that 


5n,'i  •  ■  •  5n 


j+1,1 


2(l-5n.V--5n^^+i,i)     1 


< 


1  0 

2  1 


ko/tfcj.+i  ! 


< 


t+i 


25 
113 


(5.8) 


Now  we  can  bound  |eo'|  and  |€i,<|.  From  the  algorithm  we  see  for  i  <  ra  —  1  we  have 
a\  =  r,-2  so  that  i^>  —  fr.j-  Substituting  the  bounds  for  Ifo/icj.l  and  |€/,|  from  (5.8)  into  the 
bound  for  If^.j!  and  simplifying  yields 

t'  1+1 

|fa:l<4(^|efcJ  +  53|eaJ)  +  (35i-20)£      where       t<n-l 

We  also  see  from  the  algorithm  that  a„  =  /i„_i  *  C5„_i,2,  which  implies  |€a/  |  <  £  +  |f/i„_i  I  + 
|€cj„_i  2 1-  Substituting  the  bounds  for  IfoWc*.  I  and  |f/, |  from  (5.8)  into  the  bounds  for  |f^„_,  | 
and  |€cj„_i  2 1'  adding  and  simplifying,  yields 

n  — 1  n 

kaj  <  5(^  |€bj  +  ^  |e,J)  +  (41n  -  58)£ 
j=i  j=i 

Next  we  see  from  the  algorithm  that  6,  =  oldsn,^i  *r,+  ij  for  t  <  n-2.  Since  o/d^n.+i  = 
«n,-2,  this  implies  lej'l  <  e  +  |€,n,j|  +  |fr,+,|.  Substituting  the  bounds  for  |€o/(icj. I  and  |e/, | 
from  (5.8)  into  the  bounds  for  l^jn^l  and  |fr,+,|.  simplifying  and  adding  yields 

1+1  1+1 

|f6;l<6(5;k6j  +  5^kaJ)  +  (47n-27)£     where      i<n-2 
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We  also  see  from  the  algorithm  that  6„-i  =  /in-i  ♦  snn-\,2^  which  implies  Ifj,-      I  <  5  + 

1  —  1 

|fA„_i  I  +  \^ann-i,i\-  Substituting  the  bounds  for  |co/dci,  I  and  |e/J  from  (5.8)  into  the  bounds 
for  |€a„_i1  aJid  |€,„„_jjl,  simplifying  and  adding  yields 

n  — 1  n 

J=l  J=l 

Combining  the  last  four  displayed  inequalities  yields  claim  (5.5)  of  the  lemma. 

Next  we  bound  the  relative  errors  in  the  computed  sines  and  cosines  f^n,,  <  ^ jn.2  ^  fcj.i 
and  fci,2-  Substituting  the  bounds  for  |fo/d<:s,  I  and  |€/,  |  from  (5.8)  into  the  earlier  bounds 
on  the  \(,n,-i\,  Ifjn.jl,  kc5.ll  and  |ecj.j|  and  simplifying  yields 

I  t 

t  t+1 

k.n. J  <  5(^  ktj  +  E  Is  I)  +  (4b- -  25)£ 
I  t 

I  1  +  1 

\(csj  <  5(1]  |€6j  +  E  ISl)  +  (41^  -  25)£ 

j=l  ;=1 

Combining  the  la.st  four  inequalities  yields 

n— 1  n 

fc5,.„  =  max(|f,„.J,k,„.j|,|e,,.,|,|e„.2l)  <  5(^  k^J  +  ^  kaj)  +  (41n  -  66)£ 

j=i  j=i 

implying  (5.6).  D 

Now  we  bound  the  relative  error  in  the  computed  bidiagonal  matrix  after  m  steps  of 
the  implicit  zero-shift  QR  algorithm.  For  reasons  which  will  become  clear  shortly,  we  will 
denote  our  initial  bidiagonal  matrix  by  fi(°°),  the  matrix  after  m  steps  of  the  algorithm  in 
exact  arithmetic  by  5''"°^  and  the  matrix  after  m  steps  of  the  algorithm  in  floating  point 
arithmetic  by  B'"""K  We  wish  to  bound  the  maximum  componentwise  relative  difference 
between  5^"*°*  and  5("""),  which  we  measure  by 


reldif{B^"'°K  £("»'"))  =  max  |  log 


^(mO) 


1}  -    d('"0)  ' 

'] 

the  subscripts  i,j  varying  over  the  bidiagonal  entries. 

The  proof  will  use  the  matrix  M(mi,m2),  which  is  the  Jacobian  matrix  of  the  transfor- 
mation induced  by  the  algorithm  in  going  from  the  mi-st  to  the  m2-nd  bidiagonal  matrix, 
but  in  the  variables  logo,  and  log  6,.  Because  small  absolute  perturbations  of  the  logarithms 
logo,  +  {  are  the  same  as  small  relative  perturbations  of  the  matri.x  entries  ]oga,(l  +  t). 
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M(mi,m2)  measures  how  relative  errors  in  the  mi-st  bidiagonal  matrix  are  propagated  to 

the  m2-nd. 

M(mi,  mj)  will  be  analyzed  in  detail  in  section  9,  but  we  need  just  one  result  (Theorem 
9.23)  from  that  section: 

||A/(mi,m2)||^  <(8n-4)(m2-mi)  +  0(l)  (5.9) 

where  n  is  the  dimension  of  the  bidiagonal  matrix.  The  numerical  simulations  in  section  11 
show  that  the  0(1)  can  indeed  be  replaced  by  zero. 

Since  the  bound  (5.9)  does  not  depend  on  the  bidiagonal  matrix,  it  can  be  used  to  get 
global  error  bounds:  if  B\  and  B2  are  two  bidiagonjJ  matrices,  and  Bi{m)  and  B2{m)  are 
the  matrices  after  m  applications  of  the  aJgorithm  in  exact  arithmetic,  then 

reldifiBiim),  B2{m))  <  ((8n  -  4)m  +  0(1))  ■  reldifiBuBi)  (5.10) 

Lemma  5.11   In  terms  of  the  above  notation 

re/di7(5<"'°\5<"""')  <  £(18871^  -  282n  +  54)(m2  -  m  +  1)  +  0{nms) 

Remark  5.12  In  practice,  as  illustrated  by  the  numerical  experiments  of  section  11,  the 
0{nm£)  term  may  be  replaced  by  0.  By  assuming  that  the  angles  encountered  in  the  course 
of  the  algorithm  are  bounded  away  from  7r/2  (which  is  reasonable,  since  they  approach 
zero  in  the  limit),  the  n^  dependence  may  be  replaced  by  something  proportional  to  n  (see 
Lemma  8  in  [DK88]). 

Proof.  Let  5(**)  denote  the  computed  bidiagonal  matrix  after  k  steps  of  the  algorithm 
in  floating  point  arithmetic.  Starting  from  B^''''\  consider  the  sequence  B^^'^^''\  ^Cf+^.fc)^ 
. . .,  £('"'')  which  would  be  computed  by  applying  the  ailgorithm  in  exact  arithmetic  to  5***'. 
We  will  bound  reldif{B'^"'°\B^"'"'^)  by 

Teldif{B^"'°\B^"'"'^)  <  £re/dz7(5<'"'=),5('"''=-^') 

Now  B^™*)  and  jPi'"'*-!)  arise  from  applying  the  algorithm  in  exact  arithmetic  to  fi'*'''' 
and  5(*'*-i),  respectively.  Therefore,  by  (5.10)  the  relative  difference  between  them  will  be 
bounded  by 

re/di/(5('"*',  S*'"'*-^))  <  ((8n  -  4)(m  -  k)  +  0(l))re/di/(5(^*',  5<*''=-'') 

But  5^**)  is  obtained  from  5(*=-i.*-i)  by  one  step  of  the  algorithm  in  floating  point  arith- 
metic, and  5(*=.*-i)  is  obtained  from  bC^-i^*-!)  by  one  step  of  the  algorithm  in  exact 
arithmetic.  Therefore,  re/<ii7(5<**),5<*'*-i))  is  bounded  by  (5.5)  in  Lemma  5.4: 

reWz7(S<**),J3<*'*-'))  <  (47n  -  27)£ 

Combining  the  last  three  displayed  equations  yields  the  claimed  result.  D 

We  note  that  the  method  of  proof  is  analogous  to  the  way  error  bounds  are  derived  for 
computed  solutions  of  initial  value  problems  for  differential  equations:  The  truncation  error 
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at  each  step  is  analogous  to  our  one  step  error  bounded  in  Lemma  5.4.  Then  the  differential 
equation  being  solved  is  used  to  propagate  the  error  bound  for  the  truncation  error  forward; 
this  is  exactly  what  we  are  doing. 

The  next  lemma  shows  that  if  the  mziximum  error  fc*,»n  in  the  computed  sines  and 
cosines  is  small,  so  will  be  the  error  in  the  singular  vectors  computed  by  U PDATE: 

Lemma  5.13  Let  V  by  and  n  by  n  orthogonal  matrix,  and  V  the  updated  orthogonal  matrix 
obtained  by  running  one  step  of  the  implicit  zero-shift  QR  algorithm  in  exact  arithmetic. 
Let  6V  be  a  perturbation  of  V ,  and  let  V  +  6V'  be  the  matrix  obtained  by  running  the 
algorithm  in  floating  point  on  V  +  6V ,  where  we  assume  the  relative  errors  in  the  computed 
sines  and  cosines  are  bounded  by  fcs,tn-    Then  to  first  order  in  ecs,sn,  £,  ond  ||<5V'||2 

\\6V%  <  23/2(„  _  1)^  +  2^/2(„  _  1),^^^^^  +  ||^v/||^ 

Proof.  It  suffices  to  ancdyze  the  errors  from  one  call  to  U PDATE.  Let  cs  and  sn  be 
the  true  values  of  the  cosine  and  sine,  and  cs{ I  +  (cs)  and  sn(  1  +  (sn)  the  perturbed  values. 
Let  Vi  and  V2  denote  the  two  columns  of  V  being  modified.  Then  their  true  new  values  are 


[v[,v'2]  =  [Ui,l'2] 


cs     —sn 
sn      cs 


Let  [Svi ,  SV2]  be  the  perturbation  of  [vi ,  ^2]  due  to  all  previoiis  contributions.  Then  the  j-th 
components  of  the  new  perturbation  after  computing  can  be  written  (to  first  order  in  (ca.sn, 
e  and  \\6V\\^)  as 


where  |e,|  <  e.  Thus 


cs{(c3  +  Si -^  S2)      -sn{(,rx+  S3  +  S4] 

Sn(€,rx  +  £5  +  ^e)         CS{(c,  +  €7  +  £s) 


+  [(5i-ij,(5i'2j] 


cs     -sn 
sn      cs 


\\[Sv[,6v'2]\\,  <  2'/2(e„,,„  +  2e)  +  \\[6 vi , S V2]\\2 


and  applying  this  bound  n  -  1  times  (for  each  call  to  UPDATE)  we  get  the  result  claimed 
in  the  lemma.  D 

Lemma  5.14  Let  B  be  an  n  by  n  bidiagonal  matrix,  and  let  V  and  U  be  the  orthogonal 
matrices  obtained  by  running  the  implicit  zero-shift  QR  algorithm  m  times  in  exact  arith- 
metic. In  other  words,  set  V  and  U  to  n  by  n  identity  matrices  initially,  and  let  them  be 
modified  by  the  calls  to  U PDATE  in  the  algorithm.  A'ou;  let  V  and  U  be  the  floating  point 
matrices  obtained  by  running  the  algorithm  in  arithmetic  of  precision  t.  Then  to  first  order 
in  e  we  have 


majc( 


V  -V 


U  -U 


)  <  947n\m^  +  m)£  +  Oisn^m^) 


(5.15) 


Remark  5.16  In  the  numerical  experiments  of  section  11,  the  O(fn^m^)  term  abovp  may 
be  replaced  by  0.  By  assuming  that  the  angles  encountered  in  the  course  of  the  cdgoiithm 
are  bounded  away  from  t/2  (which  is  reasonable,  since  they  approach  zero  in  the  limit), 
the  n*  in  the  error  bound  may  be  replaced  by  something  proportionzd  to  n'  (^'"e  Lemma  8 
in  [DK88]). 
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Proof.   By  Lemma  5.13,  after  each  application  of  the  algorithm  the  error  in  V  increases 


by 


V2{n  -  l)(€c.,,n  +  20 

By  Lemma  5.4,  at  the  A:-th  stage  €„,,„  is  bounded  by 

\(c,,,n\  <5(2n-  l)reldif{B^''''\B^^°^)  +  (41n-66)£ 
By  Lemma  5.11,  re/dt/(5<**),  BC^"')  is  bounded  by 

r€ldif{B^''''K  3^''°^)  <  e{l88n'^  -  282n  +  5A){k^  -  k  +  1}  +  0{nk£) 
Combining  the  last  three  displayed  expressions  yields 


V  -V 


<  ^v/2(n-l)-(5{2n-l)[£(188n^-282n+54)(fc^-fc+l)+0(nit£-)]+(41n-66)£+2£) 


ife=i 


which,  when  simplified,  yields  the  desired  result.  D 
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6     Global  Error  Bounds  on  the  Computed  Singular  Vectors 

In  this  section  we  present  the  main  practical  contribution  of  the  paper:  an  error  bound  for 
singular  vectors  computed  by  the  overall  bidiagonal  SVD  adgorithm: 

Theorem  6.1    The  error  in  the  computed  the  i-th  left  and  right  singular  vectors  is 

p{n,  m)tol/relgap, 

where  to!  is  the  desired  relative  precision  input  to  the  algorithm,  p{n,Tn)  is  a  low  order 
polynomial  in  the  matrix  dimension  n  and  number  of  QR  steps  m,  and  the  relative  gap 
relgap,  was  defined  in  1.7. 

Proof.  To  perform  the  error  analysis  we  need  to  associate  a  tree  with  the  execution 
of  the  algorithm.  Nodes  of  the  tree  will  correspond  to  unreduced  submatrices  5,  upon 
which  the  algorithm  operates.  The  root  node  corresponds  to  the  original  matrix  and  the 
leaf  nodes  correspond  to  1  by  1  and  2  by  2  submatrices  where  the  algorithm  hsis  converged 
(recall  that  2  by  2  matrices  are  handled  specially).  A  directed  edge  from  node  5,  to  node 
Bj  will  mean  that  Bj  is  obtained  from  fl,  by  performing  some  step  of  the  algorithm.  There 
are  three  kinds  of  edges:  "stopping"  edges,  "zero-shift  QR"  edges,  and  "shifted  QR"  edges. 
Stopping  edges  correspond  to  the  stopping  criterion  deciding  to  set  one  or  more  off  diagonal 
entries  of  B,  to  zero.  In  this  case  Bj  is  a  submatrix  of  B,.  If  the  stopping  criterion  sets  p 
offdiagonal  entries  of  B,  to  zero  at  the  same  time,  5,  will  have  p  +  I  child  nodes,  one  for 
each  resulting  submatrix.  A  zero-shift  QR  edge  corresponds  to  one  or  more  applications  of 
the  zero-shift  QR  adgorithm.  A  zero- shift  QR  edge  connecting  fi,  to  Bj  represents  all  the 
zero-shift  QR  steps  applied  to  B,  before  the  stopping  criterion  is  satisfied.  Fincdly,  if  the 
sequence  of  QR  steps  by  which  Bj  is  obtained  from  5,  includes  at  least  one  shifted  QR  step, 
then  Bi  is  connected  to  Bj  by  a  shifted  QR  edge.  Normally  a  shifted  QR  edge  wiU  represent 
only  shifted  QR  steps.  However,  our  estimates  a{B,)  and  a(5,)  of  the  smallest  and  largest 
singular  values  of  B,,  which  are  used  to  choose  between  shifted  QR  and  zero-shift  QR,  are 
not  perfect,  so  there  is  a  chance  the  algorithm  could  apply  both  kinds  of  QR  steps  to  the 
same  submatrix.  As  we  will  see,  this  does  not  impact  the  error  analysis. 

Thus,  any  path  from  the  root  of  the  tree  to  a  leaf  node  starts  with  a  QR  edge  of  either 
type,  and  then  alternates  between  stopping  edges  and  QR  edges  until  finally  hitting  a  leaf 
node  at  the  end  of  a  stopping  edge.  A  node  can  have  at  most  one  entering  edge  (stopping 
or  QR),  and  either  one  exiting  edge  (which  must  be  a  QR  edge)  or  more  than  one  exiting 
edge  (which  must  be  stopping  edges). 

The  proof  of  the  theorem  proceeds  by  induction  from  the  leaves  of  the  tree  toward  the 
root.  We  will  show  that  if  the  computed  singular  vectors  of  all  the  children  of  a  parent  node 
satisfy  the  error  bound  0{tol/relgapk),  then  so  do  the  computed  singular  vectors  of  the 
parent  node.  First  we  explain  the  induction  without  computing  detailed  error  estimates, 
and  then  we  include  the  error  estimates.  In  the  proof  the  notation  O(-)  will  absorb  all 
dependence  on  dimension  and  number  of  QR  steps. 

First  consider  the  leaf  nodes,  which  are  all  1  by  1  and  2  by  2.  There  is  nothing  to  prove 
for  the  1  by  1  nodes,  and  for  2  by  2  nodes  the  special  subroutine  discussed  in  section  3 
computes  the  singular  vectors  with  the  desired  error  bounds. 
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Now  Suppose  B,  is  the  first  node  on  the  path  from  the  root  whose  exiting  edge  is  a  shifted 
QR  edge.  In  other  words,  only  zero-shifted  QR  steps  have  been  applied  to  B,  so  far.  Starting 
with  B,,  the  algorithm  essentially  reverts  to  the  standard  shifted  SVD  algorithm,  which  is 
backward  stable  in  the  usual  normwise  sense  and  so  computes  the  A:-th  singular  vectors  of 
5,  with  an  error  bound  0{tol/absgapk).  where  absgapk  =  min;  |(7/t(5, )  -  ai(Bi)\/eTi{B,). 
From  the  algorithm  of  section  3,  we  see  that  shifted  QR  is  used  only  when  a(B,)/W{B,)  > 
n~^  ma.\(i//o/,.01)  >  .Ol/n,  i.e.  only  when  the  smallest  singular  value  of  B,  is  not  more 
than  about  .Ol/n  times  smaller  than  the  largest  singular  value.  This  implies  that  the  relative 
gap  relgapk  =  min/  |cTfc(5,)  -  at{B,)\/\ak{B,)  +  <7i{B,)\  cannot  be  more  than  about  200n 
times  larger  than  the  absolute  gap  absgapk-  Thus  the  error  bound  for  the  fc-th  computed 
singular  vectors  of  5,  are  still  Oitol/relgapk)  as  desired. 

(This  is  where  we  use  the  modification  of  the  original  algorithm  from  [DK88].  If  the 
threshold  for  using  shifted  QR  had  been  n~'^£/tol  instead  of  ra"^  max(i:/to/,  .01),  the  error 
bound  would  have  been  0(tol'^ I(e -relgapk))  instead  oi  0(tol /relgapk).  For  tol  just  slightly 
larger  than  e  there  is  no  diflference,  but  for  tol  approaching  e^/'^  the  bound  0[toi^/(£  ■ 
relgapk))  is  significantly  weaker  than  0(tol/relgapk).) 

At  this  point  in  the  induction  we  have  shown  that  the  error  bounds  for  computed  singular 
vectors  are  0(£/Telgapk)  for  jlII  nodes  which  only  have  zero-shift  QR  edges  and  stopping 
edges  between  them  and  the  root.  First  consider  stopping  edges.  Suppose  B,  is  the  parent 
node  from  which  the  stopping  edges  issue.  By  Theorem  4.1,  the  stopping  criterion  only 
changes  the  singular  vectors  by  O(tol/relgapk),  where  relgapk  is  measured  with  respect  to 
the  singular  values  of  B,  only.  This  relgapk  may  be  larger  than  the  relgapk  measured  with 
respect  to  the  entire  matrix  (since  5,  contains  only  a  subset  of  the  spectrum  of  the  original 
matrix),  but  this  only  improves  the  error  bound  0(tol/relgapk).  This  lets  us  moves  the 
induction  toward  the  root  along  stopping  edges. 

Finadly,  consider  a  zero-shift  QR  edge  from  B,  to  Bj.  Let  true{Bj)  denote  the  matrix 
that  would  have  been  computed  from  B,  in  exact  arithmetic,  and  U  and  V  the  orthog- 
onal matrices  that  transform  true(Bj)  to  B,:  U^  ■  true(Bj)  •  V  =  B,.  We  will  consider 
right  singular  vectors  only;  the  proof  for  left  singular  vectors  is  identical.  Let  the  nota- 
tion true.vectoTk(B)  denote  the  true  fc-th  right  singular  vector  of  the  matrix  B,  and  let 
comp.vectorki  B)  denote  the  computed  fc-th  right  singular  vector  of  B.  We  want  to  show  that 
\\true.vectork(B,)  -  comp.vectork(B,)\\2  =  0{tol/relgapk}.  Note  that  true. vector k(Bi)  = 
V'^  ■  tr ue. vector k{true{B J ))  and  comp.vectork{B,)  =  V^  ■  comp.vectorki B^),  where  the 
difference  between  V  and  V  is  bounded  in  Lemma  5.14.  Consider  the  following  triangle 
inequality: 

.    \\true.vectork{Bi)  -  comp.vectork(B,)\\2 

=    V'^  ■  true.vectoTk{tTue{Bj))  -  1/^  •  comp.vectoTk{Bj) 

<  II V-^  •  tTue.vectork{true{Bj))  -  V'^  ■  true.vectork{true{Bj))      + 

V^  •  true.vectorkitrue{Bj))  -  V'^  •  true. vector k{Bj)  ^  +  (6.2) 

II V'-^  •  true.vectorkiBj)-  V^  ■  coTnp.vectork(Bj] 
=  Ii+  h  +  h 
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By  Lemma  5.14,  /i  =  0(5).  By  Theorem  2.12  and  Lemma  5.11,  I2  =  0{s/relgapk).  By  the 
induction  hypothesis,  /a  =  0{tol/relgapic).  Thus,  the  sum  Ii  +  I2  +  I3  =  0{tol/relgapk)  as 
desired.  This  completes  the  induction. 

Now  we  consider  more  rigorous  bounds.  Explicit  formulas  for  such  bounds  would  be 
quite  complicated  and  pessimistic  and  shed  little  new  light  on  the  problem.  However,  it 
is  illuminating  to  use  the  tree  to  explain  how  errors  accumulate.  We  use  the  fact  that  if 
V'l  and  \'2  are  orthogonal,  and  6\\  and  6V2  are  small  perturbations,  then  to  first  order 
IH'iVj  -  (V'l  +<^li)(l2  +  «5V2)||2  <  ll<5V'i||2  +  ||«5V2||2.  This  means  that  to  first  order  pertur- 
bation bounds  simply  add  as  we  proceed  up  the  tree.  Thus,  every  time  we  move  along 
an  edge,  we  add  the  error  contributed  by  that  edge.  For  leaf  nodes  which  correspond  to 
2  by  2  submatrices,  we  use  error  bounds  for  the  special  subroutine  discussed  in  section  3. 
For  stopping  edges  we  used  the  bounds  of  Theorem  4.1.  For  nodes  whose  exiting  edge  is 
the  first  shifted  QR  edge,  we  can  use  the  error  bounds  for  the  conventional  SVD  algorithm 
[GK65].  For  zero-shift  QR  edges  we  use  the  analysis  of  the  last  paragraph.  Only  the  edges 
connecting  a  leaf  to  the  root  contribute  to  the  error  for  the  singular  vector  corresponding 
to  the  leaf.  D 
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7      Flows  and  the  SVD  Algorithm 

For  a  matrix  A,  let  A_  denote  its  strictly  lower  tridiagonal  part  and  set  Tro(A)  =  A_  —  A^.. 
We  will  consider  flows  on  invertible  real  matrices  A  of  the  form 

^  =  A(7ro(F(A^A)))  -  (;ro(F(A  A^))  )A  ,         A(<  =  0)  =  Ao  ,  (7.1) 

at 

where  F  is  a  smooth,  real- valued  function  on  (0,oo). 

Such  flows  were  first  considered  in  the  singular  value  context  by  M.  Chu  [Chu],  who 
analyzed  the  Toda  case,  F{x)  =  x.  For  SVD  we  will  set  F{x)  =  logx,  but  initially,  for 
reasons  of  general  interest  and  to  suggest  additional  possibilities,  we  will  consider  general 
F. 

Convention.  By  the  QR  factorization  of  a  real,  invertible  matrix  X,  we  mean  A'  = 
QR,  where  Q  is  orthogonal  and  R  is  upper  triangular  with  positive  diagonal  entries  (see 
[GVL83]). 

Theorem  7.2   Equation  (7.1)  has  a  unique,  global  solution  A(t)  which  preserves  the  sin- 
gular values  of  A{t).    The  flow  can  be  solved  explicitly,  as  follows.   Let 

e^Fi-^L^^)  =  Q^^t)  R,[t)  (7.3) 

be  the  QR- factorizations  o/ e'^^-'^o  ■^"^  and  e'^^-'^°'^o)  respectively.   Then 

A{t)  =  Q^{t)  Ao  Qiit)  .  (7.5) 

Finally,  (7.1)  preserves  bidiagonality  i.e.  if  .Aq  is  bidiagonal,  then  A(t)  is  bidiagonal  for  all 
t  >  0.   Moreover  the  signs  of  its  nonzero  entries  are  preserved. 

Proof.    Differentiation  of  (7.3)  and  (7.4),  give 

j^Qi  =  Qi  -o(Qfi^(AjAo)gi)  -  Qi  MFiiQlAoQifiQ^AoQi)))  =  Q.Mn-Mtf-Mt))) 

and 

^^2  =  Q2MQ2  n^oA^)Q2)  =  Q2MF{iQ^AoQi){QlAoQif))  =  Q2MF{-mMtf)). 
Thus 

^A(0     -     [  -  MF{A(t)  A{tf))  Q[]  AoQi  +  Q^  Ao[Qi-o(F{A(tf  Ait)))] 
=     A{t):ro{F{Ait)^A{t)))  -  7ro{F(A{t)A(t)^ ))  .i{t)  , 
which  is  equation  (7.1).  From  (7.3),  (7.4),  (7.5), 

.4(0     =     (i?2(0e-'^'-^°<))  Ao(e'^(^o^-^°)(ii:i(0)-') 

=     i?2(0  AoiRdt))-'  ,  (7.6) 
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from  which  we  learn  that  (7.1)  preserves  upper  triangularity.  On  the  other  hand,  we  also 
have 

A^{t)  .4(0     =     Q,{lf  AlAoQi(t) 

=     RAt)  AlAoiRiit))-'  ,  (7.7) 

so  that  ^-^(0  .4(0  is  upper  Hessenberg,  and  hence  tridiagonal,  by  symmetry.  It  follows  that 
A{t)  is  bidiagonaJ.  Furthermore,  it  follows  from  (7.6)  and  (7.7)  that 

sgn  .4,.(0  =  sgn(/4o)..  ,      sgn  -4,,+i(t)  =  sgn(/lo)..+i  ■  (7.8) 

In  particular,  if  a^  and  6,  are  positive  initially,  they  are  positive  for  all  time. 

Finally,  the  preservation  of  singular  values  is  immediate  from  (7.5),  and  this  proves  the 
theorem.  D 

The  above  result  is  due  to  Chu  [Cbu86j,  and  is  modeled  on  related  results  ([SymSO], 
[Sym82])  and  Deift-Nanda  Tomei  (DNTS3],  for  the  symmetric  eigenvalue  problem.  The 
relationship  between  the  singular  value  flow  (7.1)  and  Toda-type  eigenvalue  flows  ([Sym80]. 
[Sym82].  [DNT83],  [DLNT86],  [DLT89],  [VVat84]),  is  described  by  the  following  theorem, 
whose  proof  is  immediate. 

Theorem  7.9    Under  the  map 

A^T(A)  =  A^A  (7.10) 


equation  (7.1)  u<:  transformed  into 


^  =  [r,  MFiT))]  .  (7.11) 


0     A'^  \ 
Remark  7.12  The  perfect  shuffle  ^  ""*    (    4     f^       I    '-'   S(A)  of  Section  2,  transforms 

the  singular  value  problem  for  A  into  an  eigenvalue  problem  for  5.  One  naight  ask  what 
happens  to  (7.1)  under  the  map  A  ^  S.  One  finds  that  (7.1)  is  transformed  into  the 
Toda-type  isospectral  deformation 

^  =  [5,  iroiF(S'))]  .  (7.13) 

Of  particular  interest  is  the  case  F{x)  =  x.  Here  (7.11)  becomes  the  Toda  flow,  ^  = 
[T,no{T)],  but  (7.13)  reduces  to  ^  =  [5, 7ro(5^)];  in  the  case  where  A  is  bidiagonal  and  5 
takes  the  form  (2.2),  this  is  the  so-called  Kac-van  Moerbeke  lattice  [KvM75]. 

The  flow  that  is  directly  related  to  the  SVD  algorithm  corresponds  to  the  choice  F{x)  = 
log  J.  To  see  this  note  that  (7.11)  becomes 

—  =  [T,7ro{logT)],         T{0)  =  aIAo, 
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whose  solution  is  given  by  (7.7), 


But  from  (7.3). 
and  so 

Thus 


T(t)  =  Qj{t)TiO)QAt)  . 

r(0)  =  e'°«^'°>  =  gi(l)i?i(l), 
T{l)=R,(l)Qi{l). 


A^(0)  A{0)  =  T(0)  -  r(l)  =  A(\fA{l) 
is  just  one  step  of  QR,  and  hence,  in  the  bidiagonal  case, 

A{0)^  A(l) 

is  one  step  of  SVD. 

In  summary,  we  have  proven  the  following  bjisic  result. 

Theorem  7.14  Let  Aq  be  bidiagonal  and  let  F{x)  =  logi.  Then  the  integer  time  eval- 
uation of  the  solution  A{t)  of  (7.1)  gives  precisely  the  iterates  of  the  SVD  algorithm, 
.4o.^i Ak Thus 


Aik)  =  Ak  ,         k  =  0.1,2, 


(7.15) 


We  will  call  the  flow  induced  by  (7.1)  in  the  case  F{x)  =  logi,  the  SVD  flow. 

In  the  classical  c<ise  of  the  Toda  flow,  where  T  is  tridiagonal  and  F(i)  =  i,  Moser 
[Mos75]  proved  the  remarkable  result  that  the  solution  7(0  of  (7.11)  converges  to  a  diagonal 
matrix  as  /  — >  oo.  The  Scime  is  true  if  T  is  a  full  symmetric  matrix,  provided  f{x)  is 
strictly  monotonic  on  spec  T  (see,  e.g.,  [DLT85]).  In  the  bidiagonal  case,  the  convergence 
of  T{t)  =  A{t)'^ A{t)  in  turn  implies  that  A{t)  also  converges  to  a  diagonal  matrix  (see 
[Chu86]).  For  the  reader's  convenience  we  will  present  a  (new)  proof  of  the  convergence 
of  .4(0.  calculating  en  route  the  leading  asymptotics  as  t  — •  oo.  By  Theorem  7.14,  this  of 
course  gives  an  independent  proof  of  the  convergence  of  SVD  and,  by  Theorem  7.9,  also 
QR. 

To  fix  notation,  let 


/  ci     6i 


A  =  B  = 


o    \ 


,      a,,6,  >  0  , 


6n-l 

V  O  an       / 

with  singular  values  CTi  >  crj  >  . . .  >  cTn  >  0,  and  let 

(  c^     d,  0       \ 

T  =  B^B  = 


di 


■•         ■•  dn-\ 

\  0       dn-i  c„    y 
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so  that 

,2    ,    l2 


c,  =  at  +  6;_i  ,         I  <i  <n  ,  (7.16) 

d,  =  a,6,  ,  1  <  !  <  n  -  1  .  (7.17) 

(Here  bo  =  0.)  Let  r,  =  (r;,(  1 ),...,  v,(n))-^,  1  <  t  <  n,  denote  the  unit  right  singular  vectors 
for  B,Tv,  =  crjv,,  normalized  so  that  f,(l)  >  0.  Set  6^  =  ^"Jj'  6,^. 

Theorem  7.18    (Asymptotics  for  SVD).  Let  B[t)  be  the  solution  of  (1.1)  with  5(0)  =  Bq 
btdtagonal  and  Fix)  increasing  on  spec  BE.    Then  as  t  —>  oo 

«/')  =  ^.(1  +  ^,,2^'     „2,(1  +  0{b'))  +  ,,/'-'     ,.(1  +  0(6^)))  (7.19) 

and 

^(n  -  f[n"^^1;/^b!]  [^]  e<^i'^w-(^^)"  (7.20) 

r/feren.<;-i(^^-^J)  =  l/orj  =  i;. 

Proof.    From  (7.7)  and  (7.3), 

r/1,0     =     (ei,Q[(Ovj) 

=     {[R,[t))-'eu  e'^(^°'i;,) 

"     (i?i(0)n"^^^^' 

and  hence 

e'^<'?'t;,(l) 

In  particular,  as  F{x)  is  increasing, 

t'j(l,0-<5ji  (7.22) 

as  <  —  oc. 

Now  recall  (see  e.g.  [GvL])  that  the  rows 

t;,(l)     1-2(1)     •••     t'„(l) 
i'i(2)     i'2(2)     •■•     f„(2) 


of  the  matrix  of  eigenvectors  V  for  the  tridiagonal  matrix  T  can  be  computed  by  applying 
the  Grajn-Schmidt  procedure  to  the  row  vectors 

t^i(l)  f2(l)  •■•    r„(l) 

a?ri(l)  a|t;2(l)  ...    tr^-^ll) 

2(k-\)      ,,,         2(k-\)       ,,,  2[k-\)       ... 

a^'       't'i(l)    a-j,'       'r2(l)  •••    c^n         Vn{\)  . 
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In  particular  if  v(lj)  =  (ui(l,0 r„(l,0).  v{2,t)  =  (ri(2,  <),•••- tv(2,  <))  denote  the 

first  two  rows  of  the  matrix  of  eigenvectors  V'(0  for  T{t)-  then 


l  =  (r(K/)Ar(2j),  r(l,  0  A  i'(2,  0)  =  J^  ( 


v,(2,0    t'fc(2,0 


But  by  Gram-Schmidt  r„(2,0  =  Xi{t)crivm{lJ)  +  Ht)Vmil,t)  where  Xi{t)  and  AjIO  are 
independent  of  m,  Ai{0  5^  0.  Using  (7.21)  we  find 


v,{2,t)     Vk{2,t) 


1       1 


fi(i.n  V2(ht] 


1    1 

2    2 
^1  ^2 


/r.(l)vfc(l)y 

U'i(i)t;2(i)y 


,2[F(,7l)+F(al)-F(^l)-F{^l)]t 


V,(2,t)     f2(2.n 
which,  by  the  monotonicity  of  F,  converges  to  zero  as  /  —  00,  unless  i  =  1  and  k  =  2.  Thus 


rid./)     V2(IJ) 
ri(2,n     t;2(2,0 


— '  ±1 


and  as  |i'i(2.0l  <  1  and  vjCl,/)  —  0  by  (7.22),  we  conclude  that 

i;,(2,0-±^2 
Continuing  by  induction  we  learn  that 

Vjikj)-^±Sjk 
as  /  —  oc,  and  hence 

T{t)  =  V{t)  L'V(t)  ^L'  =  diag(a? a^)  . 


(7.23) 


(7.24) 


(7.25) 


In  particular  d,{t)  =  a,{t)  6,(t)  ^  0.  But  det  7(0  =  (det  A{t)y  =  nr=i  a?(0  =  constant  ^ 
0.  as  (7.11)  is  isospectral,  and  max,  a^(t)  is  bounded  by  (7.5),  which  implies  ||/1(0||  =  Moll- 
It  follows  that  b,{t)  —  0,  and  from  (7.17)  (and  (7.8)), 


a,(0=^c.(/)-6,2_j(0-cr,  . 
The  eigenvalue  equation  for  T{t)  implies 

t'.(*  +  1,0  =  -|^^^det(a?  -  T(t)), 


(7.26) 


where  (cr^  -  T{t))k  h  k  x  k  matrix  formed  from  the  first  k  rows  and  columns  of  a^  -  T(t). 
In  particular,  setting  i  =  /:  +  1  and  using  (7.21),  (7.24),  (7.25)  in  (7.26),  we  find 


which  leads  directly  to  (7.20). 


To  obtain  the  asymptotics  for  a,(t),  conjugate  the  eigenvalue  equation  for  T  using  a 
diagonal  matrix  to  the  form 


1  (c2-al)     dl 


\ 


\ 


1       (c„_, -af)     dl_, 

1  (Cn-^?)/ 


/  ^.(1)  \ 


V  f.(i)  / 


=  0  . 


Applying  Cramer's  rule  to  the  leading  (z-l)x(t-l)  matrix  and  to  the  trailing  (n-i)x(n-i] 
matrix,  one  sees  easilv  that 


{•.(!-   l)/f,(t)  = 


dU 


cr.  —  (7, 


t-l 


•{l  +  OiY,d])) 


and 


r,(r  +  1)  /  v,(i)  = 


a*  -  a. 


1+1 


-{\^oC£.d])), 


respectively.  Inserting  these  relations  in  the  i     equation 


2          v,{x-  1)        2  ^'.('+  1) 
c,  -  c^  = rr-: a, 


t-,(0 


^(i; 


and  using  (7.16).  the  result  follows.       D 


Remark  7.27  From  (7.24).  v^ikj)  — •  ±fjk  as  t  — ►  oo.  The  choice  of  signs  can  be  de- 
termined from  (7.26).  Indeed  cis  /  —  oo,  Vk+i{k  +  l,t)  ~  (pos.)  x  nf=i(^A+i  ~  '^^}y  ^^^ 
so 

Remark  7.28  The  form  of  (7.19)  suggests  a  proof  using  more  standard  techniques  in  nu- 
merical analysis.  Indeed  (7.19)  can  easily  be  proved  by  applying  two  consecutive  Jacobi 
rotations  in  the  planes  (j  —  l,j)  and  (j,j  +  1)  respectively,  and  then  using  second  order 
perturbation  theory. 

Remark  7.29  Note  that  a  linearization  of  (7.1)  around  the  equilibrium  point  >l(oo)  = 

diag(ai (?„),  would  also  give  the  asymptotic  rates  (7.19)  and  (7.20),  but  without  the 

precise  constants. 

Remark  7.30  The  asymptotics  in  (7.20)  can  be  used  to  calculate  the  scattering  matrix  for 
the  classical  Toda  lattice  (see  [Mos75])  directly. 
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8     The  Hamiltonian  Structure  for  the  Flows 

In  this  section  we  show  that  equation  (7.1)  is  Hamiltonian.  More  precisely,  we  show  that 
there  is  a  Poisson  bracket  {^-Is,  the  Sklyanin  bracket,  defined  on  the  space  of  matrices, 
and  a  Hamiltonian  Up,  such  that  the  differential  equations  generated  by  {Up,  {•,  ■]s), 

j^<i>{Ait))  =  {4>,  HF}siA(t))  ,         A{0)  =  Ao  ,  (8.1) 

for  all  observables  </>,  are  equivalent  to  (7.1). 

A  general  reference  for  the  Hamiltonian  mechanics  used  in  this  paper  is  [Arn78];  a 
description  of  the  Sklyanin  bracket  can  be  found  in  [Sem84].  The  Sklyanin  bracket  can  be 
defined  in  great  generality  on  groups  and  on  associative  algebras  (see  also  [LPar]),  but  we 
will  restrict  ourselves  to  the  case  where  the  underlying  space  is  Af„(R),  the  algebra  of  real 
n  X  n  matrices. 

We  begin  with  some  notation  and  definitions.  The  space  Mn(R)  carries  a  natural 
(p/(n,R)-ad-)  invariant  pairing 

{A,B)  =  tT  AB  ,  (8.2) 

{A,[BX])=-{[B,A],C),  ■  (8.3) 

where  A,B,C  belong  to  M„(R)  and  [•,]  denotes  the  standard  commutator.  Denote  by 
A'a  (resp.  Xa)  the  (G/(n,R)-)  left-invariant  ((G/(n,R)-)  right-invariant,  resp.)  vector  field 
generated  by  a  G  A/„(R).  Thus  for  smooth  functions  4>  :  M„(R)  — ►  R, 

Xa4>{9)  =  j^U=o'i>(9e"')  =  {D'4>{g),a)  (8.4) 

^a<t>{9)  =  ^Uo<^(e"'5)  =  {D4>{g),a)  (8.5) 

for  all  5  €  M„(R).  Clearly 

D'<}>{9)  =  iV4>{g)fg  (8.6) 

D<t>{g)  =  g{V4>ig)f  ,  (8.7) 

where  V<t>{g)  is  the  matrix  with  entries  ^{g).  Standard  computations  for  Lie  brackets 
show  that 

[Xa,Xb]  =  X[^^b]      .  (8.8) 

[Xa,Xb]  =  -X^^^b],  (8.9) 

and 

[Xa,Xk]  =  0.  (8.10) 

Finally,  a  linear  map  R  :  Mn(R)  —  A/„(R)  is  said  to  solve  the  modified  Yang-Baxter 
equation  (mYB)  if 

[R{A),  R{B)]  -  R{[A,  R{B)]  +  [i?(^),  B])  =  -[A,  B]  (8.11) 

for  all  ^,  5  e  A/n(R).  (Such  an  R  is  an  example  of  a  classical  r-matrix  —  see  [Sem84]). 
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Theorem  8.12  Suppose  R  and  R'  are  skew  symmetric 

(A,R{B))  =  -{R(A),B),         {A,R'{B))  =  -{R'{A),B)  (8.13) 

solutions  of  {mY B).   Then 

{<i>.^)RM9)  =  {R{D<i>{9)),  D^{g))  +  {R'{D'4>{9)),  D'Mg))  (8.14) 

defines  a  Poisson  bracket  on  M„(R). 

Proof.    The  only  point  to  check  is  the  Jacobi  identity,  {<pi,  {<t>2y4>3}R,R']R,R>+  cyclic 
permutations  (c.p.)  =  0.  Set 

{4>,i;}rig)  =  {R{D4>(g)),Di'{g)) 
{4>,i,]i{g)  =  {R'{D'4>{g),  D'Hg))  . 

We  have 

{D'{<t>^,<h]i(9).a)    =     ||,=o{<^2,<^3M5e"') 

=     {R'{D'<t>2{9)),  j^\t=oD'4>3{ge"'))  -  ij^l.^o^'Mge'^.  R'[D'U9))) 

which  implies 

{<^l,{<^2,03}f}/  +  C.p.       =       ^(iE'(£>'cii),  D'{<^2,<^3}2)  +  C.p. 

=      ^R'(D'i,i)XR'(D'<t>2)'i^  -  ^R'(D'<t>i)^R'(D'i,z]<t>2  +  C.p. 
=      [^R'(D'<t>2)^^R'(D'4'3)\^\  +  C.p. 

=     X[R'(D'<t>-i),R'(D'4>3)]<t>\  +  C.p.  ,       by  (8.8), 
=    (Z)>i,  [R'[D'<t>2),  R\D'<h)])  +  c.p. 

Similarly 

{01,{</>2,<^3}r}r  =   -(Z?^l ,  [iJ(/?<^2), -R(£><^)])  +  C.p., 


and 


{4>\,{<t>2,<h]T]f\- {4>\.{4>2.<h]t)T     =     [A'/?'(D'02),A'fl(£)^3)]0i  +  c.p. 

=     0,  by  (8.10), 


Thus 


{<f>i,{'^2,<t>3)R,R']R,R'  +  CP-      =      {4>lA<h,<h}t}(+  {4>lA'^2,<h]T}r  +C.p. 

=    (D'4>u[R'(D'<h), R'{D'<h)])  -  (D4>i,[RiD<h), RiD<t>3)])  +  c.p. 
=    -{D'4>u[D'4>2.D'<t>3])  +  iD4>,,[D4>2,D<h])  , 

where  in  the  last  step  we  have  used  (mYB),  (8.13)  and  (8.3).  Direct  substitution  of  (8.6) 
and  (8.7)  show  that  the  last  two  terms  cancel,  and  this  proves  the  theorem.  D 
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Set 

R{A)  =  A+-A-,         R'iA)  =  -RiA)  (8.15) 

where  ^4+  is  the  strict  upper  part  of  A  and  A-  is  strict  lower  part  of  i4  as  before.    A 
straightforward  calculation  shows  that  R  and  R'  are  skew  and  solve  {mY B). 
For  SVD,  the  Sklyanin  bracket  is  defined  by 

{<}>,i']s{A)    =     {4>,tl>)R,.R(A) 

=     {RiD<t>(A)),Drl^iA))-{R{D'^{A)),{D'rP{A)))  (8.16) 

Remark  8.17  The  R  matrix  in  (8.15)  also  arises  in  the  study  of  the  Cholesky  eigenvalue 
algorithm  (see  [DLT89]). 

By  Theorem  8.12,  {■,}s  gives  a  Poisson  bracket  on  M„(R).   But  more  is  true:   {■,-}s 
restricts  as  a  Poisson  bracket  to  the  submanifolds 

{A  6  A/„(R)  -.del  A  =  c} 

for  any  constant  c  ^  0.    Indeed,  from  the  formula  (Vlogdet  ,4)^  =  A~'^,  we  see  that 
D\ogdei{A)  =  D'\ogdei(A)  =  /,  which  implies  ;i!(Z)logdet(/i))  =  iZ(Z?'Iogdet  A)  =  0. 
Hence  {det,(;&}s  =  0  for  all  functions  4>. 
We  now  show  that  (7.1)  is  Hamiltonian. 

Theorem  8.18  Let  F{x)  be  a  smooth  real- valued  function  on  (0,oo)  and  let  Gf{x)  = 
-  f    27-'^^  ^  °  primitive  of  -F{x)/2x.   Then  the  equation 

J^<t>{A{t))  =  {<p,  nF}s{Ait))  ,         A{0)  =  Ao  , 

generated  by  Hf{A)  =  trGfiA^A)  on  {A  €  M„(R)  :  det  A  =    det  Aq  #  0},  is  equivalent 
to  (7.1). 

Proof.    From  (8.17), 

<i>  =  {</>,  nF}s  =  -{V4,^iA),  iR{AVHf{A)))A)  +  (V<^^(A),  AR{VHj:iA)A))  , 

so  that 

A  =  A  R{Vnl{A)  A)  -  {R[A  VHf{A)))  A  .  (8.19) 

But  by  differentiation, 

VHf{A)    =    2AG'FiA^A) 

=     -AiA'^A)-'^  FiA'^A) 
=     -{A^r'FiA^A), 

which  implies 

A  =  -A  RiFiA'^A))  +  R(F(AA^))  A  . 
As  To(5)  =  -R{S)  for  any  symmetric  matrix  S,  this  proves  the  theorem.       D 
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In  particular 

HsvDiA)  =  -]u(\og{A'^A))^  (8.20) 

4 

generates  the  S\'D  flow.  Also,  H  =  -^  triA'^A)  generatess  the  Toda-SVD  flow  in  [Chu86]. 
The  bracket  {•,  ■}s  is  highly  degnerate  and  the  determination  of  the  associated  symplec- 
tic  leaves  (see  [WeiSS],  [Sem85])  is  in  general  extremely  difficult.    We  have,  however,  the 
following  happy  fact. 

Theorem  8.21    The  set  B^  of  bidiagonal  matrices  B  with  positive  entries  a.p,bq  and  fixed 
determinant, 

n 

det  B  =Ylap  =  A  (8.22) 

p=i 

is  a  (2n  —  2) -dimensional  symplectic  leaf  for  the  Sklyanin  bracket  {■,  ■}s-  Moreover,  B^^  has 
a  global  Darboux  coordinate  system  given  by 

I.  =  log6,  ,  1  <  i  <  n  -  1  ,  (8.23) 

y.  ^logj^a,  ,     1  <  2  <  n-  1  ,  (8.24) 

{j,,ij}s  =  0,      {!/.,y;}s  =  0.      {x„y,}s^6,j.  (8.25) 

Proof.    We  compute  {(i>,r!:}s(B)  at  a  bidiagonal  matrix  B.  Set  T](m)  =  -1,0, 1  if  m  is 
negative,  zero,  or  positive  respectively. 
Insert  the  formulae 

(5V(^^(5.)).j     =     a,Oj,  +  b,<t>,,,  +  , 
(R{BV<t>^{B))),,     =     r;(;-.)(a.c>j, +  6.^;..  +  i) 

(i?(V0^(5)B))„     =     rj(j-t){a,0„  +  bj.i4>j.i,,) 
and  their  analogs  for  ip,  into  (8.17),  to  obtain  after  some  algebra 

{0,C}s(5)     =     '^{T]{j-t)-T]{j-\-l-i))a,bjOjX\,j+i 

'J 

+  H  ivU  -  i)  -  VU  +  1  -  i))ajb,  <Pj,,+ii\j 
>.j 

Changing  variables,  a,  6  — »  i,  y,  on  5^  now  leads  to 

(*,.}5(B,  =  (*.*h(B(x,v))  =  |'(^|i:-|g),  (8.26, 

which  is  the  canonical,  non-degenerate  Poisson  bracket  on  R^"~^,  and  (8.25)  follows.       D 
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For  later  convenience,  we  intoduce  the  notation 


y„  =  logJJaj  =  logA 


(8.27) 


Remark  8.28  Formula  (8.26)  makes  explicit  the  fact  that  B^  is  a  sympletic  leaf  of  {•,  ■}s. 
In  particular  {4>,rp}s(B{x,y))  depends  on/yon  the  values  oi  4>  and  V  on  B^. 

Remark  8.29  The  fact  that  5^  is  a  natural  phase  space  for  the  Hamiltonian  version  of 
SVD,  is  in  striking  parallel  to  the  fact  that  Tr,  the  tridiagonal  matrices  (with  nonzero  off 
diagonal  elements  and)  with  prescribed  trace  r,  provides  a  natural  phase  space  for  the 
Hamiltonian  version  of  QR.  The  relevant  Poisson  structure  for  QR  is  given  by  the  Lie- 
Poisson  structure  on  the  dual  of  the  Lie  algebra  of  the  lower  triangular  group  (see  [Kos79], 
[Adl79];  see  also  [DLNT86]).  However,  the  map  B  i->  B^ B  from  6^  to  Tj  is  not  symplectic 
and  the  relationship  between  the  two  Poisson  structures  is  not  clear.  On  the  other  hand  the 
perfect  shuffle  5  •-►  5  of  Section  2,  induces  a  Poisson  structure  on  the  space  of  tridiagonal 
matrices  S,  of  type  (2.2),  with  fixed  determinant.  In  particular,  this  shows  that  the  Kac-van 
Moerbeke  lattice  (see  Remark  7.12)  is  Hamiltonian  on  the  space  of  such  matrices  S. 

We  conclude  this  section  by  proving  Fact  1  of  the  Introduction.  Thus,  if  M(j,  t)  is  the  Ja- 
cobian  of  the  iterated  SVD  map  from  fi,  to  Bj  expressed  in  the  variables  log  6i , . .  . ,  log  bn-\ , 
logai,...,loga„,  then 

A  e  spec  M{j,i)  o  A~^  6   spec  M{j,i)  . 


For  a  bidiagonal  matrix  B  set 


A  =  log6.  ,  l<i<n-l, 

a,  =  log  a,  ,         1  <  t  <  n  • 


so  that 


where  N  is  the  n  x  n  matrix 


Let 


^  a-  j        (^  0     TV  j    (  y'  j'      (  QJ  j  -  (  0     TV 
the  t*''  and  j'*>  SVD  iterates  B,  and  Bj,  with  t  <  ;.  Then 


/  1 

1     1 


.  Note  A^-i  = 


(8.30) 
(8.31) 

(8.32) 
0\ 


y' 


1 
V  1    1        11/ 

be  the  coordinates  of 


^^'  '~  d{p\Q')       [on]  d{x\y>)  [  0     A'-i 


(8.33) 
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Under  the  iteration,  y^  =  logdet  B^  =  logdet  B*  =  y^.  Hence 


/ 


d{x',yn 
d{x\y^) 


9(^ ^n-i.W vi,-,) 


9(i\<- 


.\'Vl l-n-l) 


\ 


O-O 


5ir 


1 


(8.34) 


/ 


Now  recall  the  following  standard  fact  from  Hamiltonian  mechanics  ([Ar]):   let  J  de- 

0    / 


note  the  standard  matrix 


-/     0 


and  suppose  (z(<;  i,  y),  y{t;  x,  y))  is  the  solution  of  a 


Hamiltonian  system  of  equations  in  R^""^  in  canonical  form 


(i(0;i,y),  y(0;i,y))=  (x,y)  , 


(8.35) 


for  some  Hamiltonian  H  :  R^"-^  ^  R  Then  for  any  t,  the  Jacobian  D  =  ^(^('^^.y).y((;x.v))  -^ 

symplectic,  i.e.  D^ J  D  =  J.  But  det(£>^  -  A)  =  det(yD-»  J"'  -  A)  =  det(£)-^  -  A).  Thus 
if  D  is  symplectic. 


A  6  spec  D  «•  A   '  e  spec  D 


(8.36) 


Finally  from  Theorem  7.14,  (7.18)  and  (8.21),  i{, . .  .,i^_i,  y{,...,yi_■^  is  the  time/  = 
J  —  :  evaluation  in  canonical  variables  of  a  Hamiltonian  flow  with  initial  data  x\ x'    , , 


!/{»•••»  y'n-i  ■  Hence  the  matrix 


9(^ ^n-l.W VJl-l) 


9(rl, 


■•n-l'V;. 


..) 


is  symplectic  and  (8.36)  holds.  Fact  1 


now  follows  from  (8.33)  and  (8.34). 


Remark  8.37  The  Hamiltonian  HsvD  on  the  (2n-2)-dimensionaJ  leaf  5a  is  completely 
integrable  in  the  sense  of  Liouville.  The  commuting  integrals  are  the  singular  values 
cTi, . .  .,(7,1-1  (reccdl  nr=i  <7i  =  A  is  a  Casimir),  and  the  associated  angles  are  suitable  com- 
binations of  the  logarithms  of  the  first  components  of  the  unit  singular  vectors.  We  leave 
the  details  to  the  interested  reader  (cf.  [Mos75],  [DLNT86],  for  example). 
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9     Asymptotics  of  ||M(j,  f)|| 


By  the  results  of  Sections  7  and  8,  the  SVD  flow  on  bidiagonal  matrices  B  = 


takes  the  form 

dx,  dHsvD 


(  a\     h 


\o 


0     \ 


dt 
dy, 
dt 
dyn 
dt 


dy, 
dHsvD 
dxi 


(ii,...,y„_i;y„)  ,     1  <  t  <  n  -  1  , 


(ii,...,y„_i;y„)  ,      1  <  t  <  n  -  1  , 


(9.1) 


=     0 


1.(0)     =     X,,     y,(0)  =  y,  ,      l<J<n-l,     y„(0)  =  y„(t)  =  y„  , 

in  the  canonical  coordinates  i,  =  log 6,,  y,  =  lognj=i  °-]^  where  H syd  =  -  4tr(log(5-^(i,  y)B(i,y)))'^. 
This  leads  to  the  equation 


|^'-(0       ^"^^^'S)^n,A-„(0)  =  /, 


(9.2) 


for  the  fuU  (2n  -  1)  x  (2n  -  1)  Jacobian  matrix  A'„(t)  =  3(^i(y^-.^n-i(t).!/i(t).^..vn(())^  ^^^^^ 
J  is  again  the  standard  (2n  -  2)  x  (2n  -  2)  matrix  j        r     ^    )  ^^'^  "^n^svD  is  the  (2ti  - 


2)  X  (2n  -  1)  Hessian  matrix 


^n^SVD  = 


d'HsvD 


dz,dz,    I 

'  l<i<2n-2,l<j<2n-l 

(^l,---,^2n-l)  =  (Xl,.--,In-l,yi yn)   ■ 


(9.3) 


(9.4) 


Our  goal  in  this  section  is  to  evaluate  A'„(0  as  <  —  oo.  By  (8.33),  the  asymptotics  for 
M{j,i)  will  then  follow.  Inserting  F{x)  =  logi  in  Theorem  7.18,  we  obtain 


and 


a,{t)  =  <T,  +  0(6^)  ,      I  <i  <n 


6.(0  ~6r(^^)",      l<i<n-l 


(9.5) 
(9.6) 


as  f  —  DO,  where  6°°  >  0  and  6^  =  '£J'^Zl  b]  as  before. 

A  convenient  formula  for  Hsvd  is  given  by  the  spectral  representation 

(log  5)^     ds 


tr(logB-5)^  =  tr   /^^^, 
Jc  s  -  B^  B  2ni 


(9.7) 
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where  C  is  the  counterclockwise  contour 


from  which  we  obtain,  after  one  integration  by  parts, 


d^ 


-^-tr(log  5^5)2  =  tr    / 

jOZk  Jc 


logs 


C      S 


d^B^B         1  dB'^B         1         dB'^B         1 

+  - 


ds 


dzjdzk  s-  B^B       dzj     s  -  B^ B     dzk     s  -  B^ B 

(9.8) 
Now  dB/dxm  =  ^m.m+if>m,  I  <  TTi  <  Tt  —  I,  where  e,j  is  the  standard  n  x  n  matrix  with 
1  in  the  (i,j)  position,  and  zero  elsewhere.  Thus  if  Zj  or  Zk  lies  in  the  set  {xi, ...,  !„_]}, 
then  g^  g^    tr(log5-^5)^  is  of  order  6,  and  hence  is  exponenticJly  decrezising.  (Here  we  use 

(9.5)  and  (9.6)  to  bound  max^gc  11(5"  -  5^5)"' ||,  etc.) 

The  leading  order  contribution  comes  from  the  derivatives  d^ jdy^dyk-  We  find 


dB^B 


=  2a]e,,-2a]^,e,+,,,+,+0{h) 


(9.9) 


and 


dxjjdyk 


+     (5_,_i,*(-4a2)e_,_,  +  <5j+i,t(-4  a^+i)e_,+i,j+i 
+     0(6)  , 


(9.10) 


where  a„+i  =  0,  etc.  Substituting  (9.9)  and  (9.10)  in  (9.8),  we  find 


a' 


dyjdyi 


-tr(logB^5)  = 


"i 


logs 


6jk{4<7]ejj  +  4aJ+iej+i,_,+i) 


\       0 

o 


2\-\ 


0       \ 


{s-al)-^  J 


+^j_i,fc(-4c72)e.j 


0  is-air' 

i^-<^h''  o 


\ 


+<5j+i,*(-4(7j+i)ej+ij+i 


+  (2(72e,,  -2^2^ iCj +  !,,  +  !  ) 


o 

\     o 


o 


{s-O 


2\-\ 
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x[2al€kk  -2cTfc^.iejt+i>+i) 


{s-aD 


O 


2\-l 


0 


ds 


+0(6)  . 
After  performing  the  integrals,  this  leads  to 


dy^dyk 
and  hence 


tT(\ogB^B)'^  =  16  6jk- 8  6j.yk-S6j+i,k  + 0{b)  ,      l<j<n-\,      I  <  k  <  n  , 


0     ...         0  [  0„,„_i        0„,„  ^  ' 


(9.ii; 


where  L  is  the  (n  -  1)  x  n  matrix 


/  -4        2 

2-4        2 
2     -4 


1  = 


0\ 


V  O 

Equation  (9.2)  now  takes  the  form 

dKn       {  0    L 


(9.12) 


dt 


where  by  (9.6), 


0    0 


=-«« 


2-4        2 

2-4       2  / 


i^„  +  C(<)A'n  ,     A'„(0)  =  /  , 


||C(OII  <  coe-*'  ,      6  =      min     log(a//<,)  >  0  , 

l<i<n-l 


(9.13) 


(9.14) 


for  some  positive  constant  cq.   Rewriting  (9.13)  in  the  standard  way  (see  e.g.    [CL55])  in 
integral  form,  we  obtain  after  iteration 


'■«>  ■  •■••■"(i;  t  )"••"» 


-( 


Au  +  tLA2i    Ai2  +  tLA 


122 


l21 


l22 


(l  +  o(l)) 


(9.15) 


for  suitable  constant  matrices  A,j.  Here  the  terms  o(l)  are  exponentially  decreasing.  Finally, 
a£  in  (8.33), 

^,,,0)  =  mh^  =  I  «..  +;j"^-    B'^  +;j"fi"  )  „  +  ^,),  (9.16, 
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for  suitable  B,j,  where  r„  is  the  (n-  1)  x  n  matrix, 


r„  =  X7V-'  = 


/  -2        2 

-2    2 


^    O 


0\ 


-2      2  / 


(9.i: 


which  appears  in  (1.9). 

To  compute  521,^22,  note  that  for  any  t, 

ai{B')  =  MB)  ,      l<e<n, 

where  B'  =  B{t).  Thus  for  1  <  m  <  n  -  1,  1  <  ^  <  n, 


frt      5a;       '5/3„ 


1=1 


56'       'dp,,       d3m  ■ 


(9.18) 


By  regular  perturbation  theory  applied  to  the  perfect  shuffle  5  of  B  (see  (2.2)),  we  see  that 
(in  the  notation  of  Section  2) 

^     =     2hn2i-l)kti2^) 

=     ^'M  v'lii)  , 

where  u^,  V(  are  the  unit  eigenvectors  of  B'{B')^  and  {B')^B'  respectively,  chosen  such  that 
u^(l),v;(l)  >  0.  But  by  Remark  7.27,  v'f{t)  -  (-l)'+i^;,  as  <  ^  oo.  A  similar  analysis 
shows  that  the  same  is  true  for  u'((i);  hence  ^"^^J  ^  —  6(,.  On  the  other  hand,  by  (9.16), 
dp'Jdpm  grows  at  worst  linearly  as  <  —  oo,  but  b[  decreases  exponentially  and  da((B')/db', 
is  bounded,  again  by  regular  perturbation  theory.  Inserting  this  information  into  (9.18),  we 
learn  that 


lim 


Thus 


and  similarly 


Also 


B21  = 


B22  = 


da'f  _    1     da( 

5  log  (7,  ' 


5/3m      /  l<:<n,  l<m<n-l 

d  log  cr, ' 


m     /  l<i<n,  l<m<n 


^nB21  = 


W: 


(9.19) 


(9.20) 


(9.21) 
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^  /    aiogtT^/g?  aiog-^^/g?     \ 


^nB22  = 


daj  '  '  '  don 


\  da  I  don  I 


(9.22) 


Recalling  that  for  any  matrix  A,  ||/l||oo  =  max,  YL^  I'^.J,  we  have  obtained  the  following 
theorem. 

Theorem  9.23  Vndtr  the  bidiagonal  SVD  flow  (9.1),  the  Jacobian  matrix  M{t,0)  = 
m^,  sat^sfies 

\  ^21  -022  / 

where  the  term  o(l)  is  exponentially  decreasing  as  tj  —  ti  — ►  oo,  and  the  constant  matrices 
B21,  B22,  r„52i,  TnB22  satisfy  (9.19)-(9.22)  respectively. 
Also 

||M(<,0)||oo<(8n-4)<  +  O(l)  (9.24) 

as  t  —>  00. 

Proof.    The  point  to  note  is  that,  by  (the  proof  of  the)  inequality  (2.4),  the  entries  of 
B2\  and  B22  are  bounded  by  1  and  the  entries  of  r„52i,  r„522  are  bounded  by  4.       D 

Remark  9.25  From  (9.5)  and  (9.6), 

a,(0  =  loga,(0  =  log(7.  +  o(l)  ,      l<t<n,  (9.26) 

and 

;9.(t)  =  log6.(t)  =  tlog(a,2+i/(7?)  +  0(l),      l<t<n-l,  (9.27) 

The  content  of  Theorem  9.23  is  that  the  leading  asymptotics  can  be  differentiated  with 
respect  to  the  initial  data.  This  in  turn  suggests  an  alternative  proof  of  Theorem  9.23:  if 
(9.26)  and  (9.27)  can  be  shown  to  hold  uniformaly  for  all  initial  data  in  a  complex  neighbor- 
hood of  B,  then  Theorem  9.23  follows  immediately  from  Cauchy's  formula.  This  approach 
can  indeed  be  carried  out,  but  we  present  no  detjiils.  (In  this  connection  we  refer  the  reader 
to  [Mos75],  where  an  analysis  of  the  asmptotics  of  the  classical  Toda  lattice  (tridiagional, 
F{x)  =  x)  with  complex  initial  data,  is  presented.)  In  Section  10,  however,  we  will  present 
(the  outline  of)  a  third  proof  of  Theorem  9.23  using  iterates  of  the  gradient  of  one  step  of 
the  SVD  aJgorithm,  and  which  does  not  utilize  the  underlying  flows. 

Finally  we  note  that  the  proof  of  (9.24)  shows  that,  more  generaUy, 

l|A^(<2,<i)IU  <(8n-4)(<2-<i)  +  0(l)  ,  (9.24)' 

as  <2  -  '1  —  00.  Moreover  the  estimate  is  uniform  in  <2  >  '1  >  0.  Recalling  yet  again  the 
relationship  between  the  SVD  flow  and  the  SVD  algorithm,  Fact  4  in  the  introduction  is 
finally  proven  by  setting  <i  =  i  and  <2  =  j  in  (9.24)'. 
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10     The  Spectrum  of  the  One-step  Jacobian  of  SVD 

We  consider  one  step  of  SVD  taking  the  bidiagonal  matrix  B  to  the  bidiagonal  matrix  B' . 
Our  goal  in  this  section  is  to  analyze  the  (2n  -  2)  x  (2n  -  2)  Jacobian  K  of  the  map  B  — ►  B' , 
expressed  in  terms  of  i,y  variables, 

J.  _  g(x'i,...,i;_i,y;,...,y;_i) 

5(ii,. .  .,i„_i,yi,. .  .,y„-i) 

/    dz'       dx' 
—       I      dx        dy 

and,  in  particular,  to  prove  Fact  2  and  Fact  3  of  the  Introduction.  Observe  that  K  is  the 
leading  (2n  -  2)  x  (2n  -  2)  submatrix  of  the  full  Jacobian  A'„(t  =  1)  of  Section  9;  K  is 
symplectic  by  the  results  of  Section  8. 
We  will  use  the  notation 

t 

A,  =  Y[a,=e^',      \<i<n  .  (10.1) 

As  before  6^  =  e^',  1  <  i  <  n  -  1. 

Our  first  result  is  formula  (10.19)  below,  which  computes  A'  to  relevant  orders  in  6,. 

The  SVD  algorithm  can  be  implemented  by  applying  a  sequence  of  2  x  2  rotations  to 
the  matrix  B  (see  [GVL83]).  A  straightforward  induction  using  these  rotations  leads  to  the 
following  formulae  for  the  entries  of  B' . 

A'^  =  s,Itj,      \<j<n,  (10.2) 

6;  =  ^^^3j_i6ja,+i  ,      l<j<n-l,  (10.3) 

where  Tj,  5^  satisfy  the  recurrences 

'■?+i  =  ^?+i  +  ^'''?+l  ,    t>o,  (10.4) 

and 

5?+i  ='-'+i  +  5?''?+ia?+2  >      »>0,  (10.5) 

where  tq  =  5o  =  1. 

These  recurrence  relations  imply 


and 


'i=E^H  n  f^i)  (10-6) 


j=0        ^        ^J+1         ^ 


where  nA=j+i  t^  =  1  if  j  =  i.  Observe  that 

t'\  =  t]{Au....A,M.-..X)  (10.8) 
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5?  =  5?(/l,,...,A.+i,6„...,6,),  (10.9) 
and  hence  from  (10.2)  and  (10.3), 

i:  =  i:(ii,...,x.+i,yi y.+i)  (10.10) 

y;  =  y,'(ii,...,i.,y,,...,y.+i).  (10.11) 
Expansion  to  third  order  in  6^  =  Yi7=\  ^h  yields 

^'  =  -4?  +  Al,b^  +  >lL2''?-i''?  +  0{b^)  ,  (10.12) 

5?=     Af  +  (2^M?-i+^^i-4?+,/ir')''?  +  (2^M?-2  MO  13) 

r?5?  =     ^f  +  (3.4f^?_i  +  ^^,>l?+,)6?  +  (3^M?-2  +  2-4?_i^?_,A?^i 

r?+i52_j  =     Al.Al,  +  (2.4?_i/l?_2/l?+i  +  >lf_2^M?+i -4.--!  )'>?-! 

+{A^AU)bl,  +  {Al,AU)l>U,  +  (2>l?_,A?_3^?^i  +  2Al,AUA^Al,A-J, 
+  Al,A^Al,A:_\)bl,bl,  +  (2A?_,  A?_,>4? 
+  Al,AlA:.\)bl,bl,  +  Af_,b^bl,  +  O(b')  .   ■ 

(10.15) 
We  first  compute  dx'/dx,  which  is  lower  Hessenberg  by  (10.10).  From  (10.3) 

Using  (10.15)  and  (10.16),  and  the  fact  that 

d        ,  2 

-T — (anything)  =  (somethJng)6,  , 
dx, 


we  obtain 


(^-)- 


+  ^+0(t'))  +0(62)) 

bl(7J  +  i  +  0ib^))    -62(^+^  +  0(62))  6^(^  +  0(62)) 

+  fl  +  0(62))  +k+0(62)) 

0(6262)  62(^^£|  +  0(62)) 

+  ^  +0(62)) 


0 

bl.ii^  +  0{b')) 


'n-l 


0(6262)  0(6262)  . . .  62.2(;^-  +  ^  +  0(62))     _fe2     (^  +  ^  +  0(62))    I 

(10.17) 
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9y'  :, 


Similar  formulae  can  be  obtained  for  ^,  |^  and  •#-.  In  particular  -^  is  lower  triangular 


and 


dx 


(       P\b]  0 

0{h\hl)  p\hl 


\0{b\bl_,)    0{hlbl_,)    ...    pl.,hl_,  I 


where 


P.  =  \/^  +  ^+0(&2),      \<i<n-l 


The  final  formulae  can  be  written  in  the  form 


^^  =  '+(^(1+^1 


52  E2  +  Ooib^) 

BEiB)B  B^E4 


where 


£2  = 


/  -4        2 
2     -4 

*^    O 


B  =  diag(pi6i,. .  .,pn_i6„_i)  , 


,     Ooibj)  is  lower  Hessenberg, 


2     -4  J 


and 


El,  £4      are  lower  Hessenberg  with  entries 
(£1 ),,,+!,  (£4).',i+i  of  the  form 
(positive  constant   +0(6^)), 

£■3  is  strictly  lower  triangular. 
Note  first  that  (10.20)  immediately  proves  Fact  5  of  the  Introduction, 


A  — '  Aqo  = 


1     £2 
0      1 


(10.18) 


(10.19) 

(10.20) 
(10.21) 

(10.22) 


(10.23) 


(10.24) 


as  t  =  fc  — '  00.  Formula  (10.20)  can  also  be  used  to  give  an  alternative  proof  of  the 
asymptotics  of  Kn(t),  as  mentioned  in  Remark  9.25.  Let  A'(j)  be  the  leading  (2n  -  2)  x 
(2n  —  2)  submatrix  of  A'„(<)  evaluated  at  time  i  =  j.  We  will  show  that 


A'(J)  = 


A\i  +  jE2A\2      ^'12  +  J  ■£^2>l22 


^21 


A', 


22 


(l  +  o(l)), 


(10.25) 


and  leave  the  (rather  lengthy)  remaining  details  to  the  reader.  From  (10.20)  and  (9.6)  we 
have  for  k  large 

K{k  +  l,k)  =  A'oo  +  ffc  , 
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where 


Writing 


But 


Ik;  II  <c/^i     0<p<l,     c  constant. 


for  large  k.  Thus 


21      ^22    /         ■'-°°i=o 


exists  and  the  convergence  is  exponential.  Formula  (10.25)  now  follows. 

Our  next  result  towards  the  proof  of  Fact  2  shows  that  the  eigenvalues  A^,  1  <  j  <  2n-2. 
of  K  =  K{t  +  \,t)  eventually  lie  in  Gershgorin-type  disks  contained  in  a  fixed  wedge  with 
vertex  A  =  1,  symmetric  about  1  +  lE,  and  with  aperture  26  less  than  tt. 


Figure  1: 

In  particular  (spec  K[t  +  \J))  n  Ris  eventually  empty. 

Observe  first  that  K  -  I  =  K(t  +  l.t)  -  I  can  be  rewritten  as 


K  -I  = 


1  +  BE3B  BE4 


(10.261 
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so  that  the  eigenvalue  problem 


reduces  to  the  system 


where 


'"-"{')=%)'  '-''-' 


{l  +  BE2B)f  =  {uB-'  -BE,)g 


.=A-:  and  r)  =  rfWo. 


(10.27) 

(10.28) 
(10.29) 

(10.30) 


For  t  —  cxD,  K{t  +  1, 0  -  (   i     f  ^    1 1  and  so 


J/-  0 


(10.31) 


As  E3  is  strictly  lower  triangular,  (1  +  BEzB)~^  exists  (even  if  the  bj  are  not  small)  and 
so  5  ^  0.  We  normalize 

ll5ll'  =  El5,l'  =  l.        '  (10.32) 

We  have 

(£2  +  0Q{b'^))9    =  (i/5-^)(l  +  BE3B)-\uB)-^g  +  E^B{1  +  BE3B]-\BE4)g 
-ExB{\  +  BEiB)-\vB-')g  -  {uB-^){\  +  BEzB)-^{BEi)g 
=  I  +  II  +  III  +  IV  . 


Now 


=  (1/5-1)2  +  0i(,.2) 


where  Oi(i/2)  is  strictly  lower  triangular, 


where  we  note  that 


EiB'E4  =  0'2ib^)  + 


II  =  02ib*)  +  E.B^E^  , 
/  0    0    Oib])        0 


(10.33)/ 


(10.33)// 


0    0        0        0{bl)     0 


V 


/ 


,02(6)  is  lower  Hessenberg, 


III=03M, 


and 


IV  =  0,{u)  , 
where  03(1/),  O^ii/)  are  again  lower  Hessenberg. 


(10.33)/// 
(10.33)/v 
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Thus  the  eigenvalue  problem  becomes 

(£2  -  {i^B-' )^)g  =  (Oi(m')  +  02(6")  -  Oo{b^)  +  EiB^E,  +  Osiu)  +  O^Ci/))^  ■      (10.34) 
For  our  present  purposes  all  we  need  is  that 

iEi-{uB-')^)9  =  0sil),     ||ffli  =  l, 
which  implies 


(10.35) 


min 


in,l4  +  (t//P;6,)2|     <  (E?=iM-4-(WP;6,)2|2|g,|2^ 
<liy||2  +  05(l), 


1/2 


where  1'  is  the  (n  -  1)  x  (n  -  1)  matrix 


Y  = 


/  0    2 
2 


Vo 


0  \ 


2 

oy 


||y||2  =  4cos-  <  4 
n 


(10.36) 


We  conclude,  finally,  that  if  1/  is  an  eigenvalue  of  A'((  +  IJ)-I,  then  eventually  i>  must 
lie  in  one  of  the  2(n  -  2)  Gershgorin-type  disks  Dj*^,  1  <  j  <  n  -  1, 


where 


{z:\A  +  iz/p^f\  <pn}  , 


|y||2<p„  =  i(||y||2  +  4)<4 


(10.37), 
(10.38) 


In  particular,  this  establishes  Fig.  1  with  sin  2^  =  Pn/4. 

Inserting  a  free  parameter  5,  0  <  5  <  1,  in  (10.26),  as  follows, 


A-  -  /  = 


1     0    W        sE,B         -41  +  siY  +  Ooib^)) 
0     B        [  I  +  SBE2B  SBE4 


B    0 

0      1 


simple  bookkeeping  shows  that  we  are  led  to  an  eigenvalue  problem  with  spectrum  lying  in 
the  same  disks  (10.37)j,  uniformly  for  0  <  5  <  1  as  t  — «■  00.  But  the  centers  of  the  disks 
are  the  eigenvalues  of  A'o  —  I  =  K,\,=q  —  I,  and  a  standard  continuity  argument  in  5  now 
shows  that  each  disk  Df  contains  one  eigenvalue  of  A'  -  /.  (If  two  disks  overlap  this  means 
that  their  union  contains  two  eigenvalues  o{  K  -  I ,  etc.) 
Recall  from  (9.6)  that 


6,(0 


br 


'7,  +  l 
Ox 


2t 


1  <  t'  <  n  -  1  , 


6^"  >0 


In  the  case  that  the  numbers  af+\/(7,.  1  <  J  <  n  -  1,  are  distinct  we  obtain  immediately 
a  proof  of  Fact  2.    For  in  this  case  the  disks  D^  are  eventuaJly  disjoint,  and  each  of  the 
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translated  disks  1  +  Df  contains  precisely  one  of  the  2(n  -  2)  eigenvalues  {A}  of  A'.  Thus 

the  eigenvalues  are  simple.  Suppose  A  G  1  +  D^  for  some  j.  Then  A~  is  also  an  eigenvalue 
by  Fact  1,  and  must  lie  in  the  same  disk,  by  a  simple  computation  using  (10.37)j.  By 
simplicity,  we  must  have  A  =  A     ,  i.e.  the  roots  lie  on  the  unit  circle. 

Remark  10.39.  The  idea  for  the  above  proof  of  Fact  2  was  suggested  to  the  authors  by 
Gene  Wayne  (cf.  [dlLW89]). 

The  remainder  of  this  section  is  devoted  to  proving  Fact  1  in  the  (nongeneric)  case  where 
the  numbers  a.+i/a,-  are  not  distinct. 

We  show  first  that  the  eigenvalues  \  -\-  u  are  eventually  geometrically  simple.    From 
(10.34),  (10.33)/-(10.33)/v,  we  see  that  the  eigenvcdue  equation  has  the  form 


-4- 


;?T^)'  +  ''(i) 


2  +  0(1) 
-'l-(^)^  +  «(l) 


0{hl)  0 

2  +  o(l)  0{bl)  0 

-^-(Flk)'  +  ''(1)     2  +  0(1)     0{l>\)    0 


\ 


(91       \ 


=  0 


/ 


\    9n-X    I 

(10.40) 
where  the  terms  *  are  bounded  as  <  — *  oo.  Now  if  v  were  geometrically  double,  we  could 
take  ^1  =  0.  But  by  induction 


det 


/  2  +  0(1)  0[hV)  0 

-4-(://62P2)'  +  o(l)     2  +  0(1)     0[hl) 


\ 


\ 


I 


in-2 


+  0(1)  #0, 


which  implies  (gi, . .  ..gn-i)     =  0,  contradicting  17  7^  0.  Thus  the  geometric  multiplicity  is 

one. 

Remark  10.41.  One  can  show  that  the  eigenvalue  equation  always  takes  the  form  ( 10.40), 

even  when  the  6,"s  are  not  small,  provided  we  replace  0(6;),  0(63),  ...  by  certain  positive 

quantities.  This  implies,  in  particular,  that  the  geometric  multiplicity  of  any  eigenvalue  of 

A'(i,i  +  1)  is  at  most  2,  for  all  t. 

Next  we  show  that  for  the  eigenvalue  problem  (10.27),  as  t  —  co. 


7'^ 

9' 


I. J 


f 


>  0 


if      Im  A  >  0  , 


(10.42) 


and 


.9' 


I. J 


f" 


<  0 


if      Im  A  <  0  , 


;i0.43) 


where  (•,•)  =  (•,-)m  denotes  the  real  Euclidean  inner  product  in  R"^.  Indeed,  from  (10.27), 

{E2  +  Oo(b^W     ={u-E,B-)f' 
{l  +  B'E3)B'f'     =(u-B^E^)g' 
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which  implies 


=  (u  B-^)g' +  0{l)  , 


where  we  have  normalized  ||p'||  =  1.  Thus 

(g',f')  =  u\\B-'gt  +  0{l). 


(10.44) 


Now  suppose  |i/||5~^5'|p|  remains  bounded  as  <  — ♦  oo.  Then  (g* ,  f)  remains  bounded  by 
(10.44)  and  as  /  —  oo 

(5^,  (^25'))  =  0{b')  +  i^it,  /')  -  ig',  E.B^f)  ^  0 

as  6  —  0.  1/  ^  0  and  B'^  f  =  (1  +  B'^E^y'^iu  -  B'^Ei)g'  —  0.  But  this  contradicts 
|(p^,£25')i  >  inf  spec(-£:2)  >  0.  Hence  i^p-'g'lp  ^  oo  as  <  ~  oo.  Formulae  (10.42)  and 
(10.43)  now  follow  from  (10.44)  and  Figure  1. 

Suppose  A  ^  A      is  an  eigenvalue  of  A',  KO,)  =  A(M.  Then  as  A'  is  symplectic, 


J'K; 


=  ,A-i;  .,VA-; 


=  AA 


'.■<■ 


rn   - 


which  implies  (  (i,),  iJ{g,)]  =  0  sls  XX  ^  1,  contradicting  ( 10.42),  (10.43).  (Note  Im  A  7^  0. 

by  Figure  1).    Thus  |Ap   =   AA  =   1.    Furthermore,  by  our  previous  Ccilculations,  A  has 
geometric  multiplicity  one.  Suppose  A  corresponds  to  a  nontrivial  Jordan  block, 

/  A     1 

0     A 

0 


A'       =  U 


U 


-1 


A-^    =U-T 


\        ■  I 

/  A    0       ...      \ 

1     A       0 


V 


\J^ 


for  some  invertible  matrix  \J .    Set  (^,)   =   \Jex\  then  A'(-^')    =   A(^,').     But  K'^J{L)   = 

JK-'^{L,)  =  A     J(^,)i  so  that  Jf  =  ^^^1  CjlJ-'^ej  for  suitable  constants  Cj,  where;  =  1  is 
excluded  as  U~^ei  is  not  an  eigenvector  of  K^.  Thus 

again  contradicting  (10.42),  (10.43).   We  conclude  that  A  is  algebraically  simple,  and  this 
concludes  the  proof  of  Fact  2. 
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t=] 


-JVI 


Fieure  2: 


'o 


In  the  initial  aumerical  experiments  mentioned  in  the  Introduction,  in  addition  to 
Facts  1,  2  and  3,  it  was  observed  that  whenever  an  eigenvalue  A  came  off  the  unit  cir- 
cle for  finite  i  =  j,  the  break  occurred  through  A  =  -1, 

The  preceding  computations  give  some  insight  into,  but.  as  yet,  not  a  complete  proof  of, 
this  phenomenon.  Indeed,  if  Ai  ^  Xo  are  two  distinct  eigenvalues  of  A",  Khi  =  A,/i,,  i  =  1.2. 
with  positive  imaginary  parts,  then  arguing  as  above,  (hi^iJho)  =  0.  But  then  for  t  large, 
when  the  spectrum  of  K{t  +  I,i)  is  simple,  we  must  have 

{h.iJh)>0  (10.45) 

for  all  nonzero  h  in  V+{t)  =  span{iii  :  K{t  +  l,t)w  —  \w,  Im  A  >  0}.  But  V+(i)  is  clearly 
continuous  in  t  as  long  as  the  spectrum  of  K(t  +  l,t)  does  not  cross  the  real  axis.  Let 
^0  <  oo  be  the  last  time  for  which  spec  I{{t  -I- 1,  0  H  R?^  0.  (Such  a  time  may,  of  course,  not 
exist.)  It  follows  by  continuity  that  ( 10.45)  holds  for  all  t  >  to,  and  using  arguments  similar 
to  those  above,  one  obtains  the  following  result:  viewed  bacicwards  in  time  from  t  —  oo,  the 
matrix  K(t  +  1,0  is  diagonalizable  with  spec  K{t  -•-  l.t)  C  {A  ;  |A|  =  1},  until  such  a  time 
to  that  an  eigenvalue  touches  the  real  axis  at  A  =  +1  or  A  =  -1. 

.A.t  this  point,  however,  it  is  not  clear  how  to  rule  out  the  case  A  =  1. 
Remark  10.46  The  reader  familiar  with  dynamical  stability  theory  will  recognize  that  the 
above  computations  are  modeled  on  the  strong  stability  theory  of  Krein  ( [KreoO.KreSol;  see 
also  [GL55].  [Mos58])  for  symplectic  matrices. 


58 


11     Numerical  Experiments 

In  this  section  we  summarize  numerical  experiments  which  support  the  error  analysis  of 
sections  5  and  6.  The  first  set  of  experiments  determines  the  average  number  of  QR  steps 
necessary  for  convergence,  and  were  performed  in  [DK88];  we  just  cite  the  needed  data  here. 
The  second  set  describes  the  growth  of  ||M(OII  for  the  same  set  of  test  problems.  The  first 
set  of  experiments  were  performed  using  Fortran  on  a  SUN  4/260  in  IEEE  standard  double 
precision  floating  point  arithemetic  [IEE85];  the  machine  precision  e  =  2"^^  ss  10~'^  and 
the  range  of  representable  numbers  is  approximately  10*^-^®.  The  second  set  of  experiments 
were  performed  using  Matlab  on  the  same  SUN  4/260  using  the  same  arithmetic. 

The  test  matrices  were  the  same  105  bidiagonal  matrices  in  12  classes  used  in  [DK88]: 

Class  1:  These  eight  matrices  are  graded  in  the  usual  way  from  large  at  the  upper  left  to 
small  at  the  lower  right.  Four  of  the  matrices  are  10  by  10  and  four  are  20  by  20.  The 
singular  values  range  from  1  to  10"^  in  some  examples. 

Class  2:  This  class  is  identical  to  class  1  except  the  order  of  the  entries  on  the  diagonal 
and  superdiagonal  are  reversed.  Thus  these  matrices  are  graded  from  small  at  the 
upper  left  corner  to  large  at  the  lower  right. 

Class  3:  These  eight  20  by  20  and  40  by  40  matrices  are  obtained  by  abutting  those  in 
class  1  with  their  reversals  in  cltiss  2.  Thus  each  matrix  is  small  at  the  upper  left, 
large  in  the  middle,  and  small  again  at  the  lower  right. 

Class  4:  These  eight  20  by  20  and  40  by  40  matrices  are  obtained  by  abutting  those  in 
class  2  with  their  reversals  in  class  1.  Thus  each  matrix  is  large  at  the  upper  left, 
small  in  the  middle,  and  large  again  at  the  lower  right. 

Class  5:  These  eight  matrices  are  obtained  from  class  1  by  reversing  the  order  of  the 
superdiagonals.  Thus  the  diagonal  is  graded  from  large  at  the  upper  left  to  small  at 
the  lower  right,  and  the  superdiagonal  is  graded  in  the  opposite  direction. 

Class  6:  These  eight  matrices  are  obtained  from  class  5  by  reversing  the  order  of  both 
the  diagonals  and  superdiagonals.  Thus  the  diagonal  is  graded  from  small  at  the 
upper  left  to  large  at  the  lower  right,  and  the  superdiagonal  is  graded  in  the  opposite 
direction. 

Class  7:  These  sixteen  matrices  are  all  small  on  the  diagonal  and  mostly  large  on  the 
offdiagonal.  The  diagonals  range  from  10"^  down  to  10"'^  and  the  diagonals  are 
mostly  1  with  occzisional  small  values. 

Classes  8-11:  The  ten  20  by  20  matrices  in  each  class  are  generated  by  letting  each  bidi- 
agonal entry  be  a  random  number  of  the  form  r  ■  10',  where  r  is  a  reindom  number 
uniformly  distributed  between  -.5  and  .5,  and  i  is  a  random  integer.  In  claiss  8,  t  is 
uniformly  distributed  from  0  to  -15.  In  class  9,  i  is  uniformly  distributed  from  0  to 
-10.  In  class  10,  i  is  uniformly  distributed  from  0  to  -5.  In  cIjiss  11,  i  is  identically 
0.  Thus,  in  class  11  each  matrix  entry  is  simply  uniformly  distributed  on  (-.5,  .5]. 

Class  12:  This  one  41  by  41  matrix  is  graded  in  as  in  class  1,  with  the  ratio  of  adjacent 
entries  being  10"-^  %  .79.  Each  offdiagonal  entry  is  identical  to  the  diagonal  entry 
below  it.  This  very  dense  grading  leads  to  different  convergence  properties  than  for 
the  matrices  in  cIjlss  1,  which  is  why  we  put  this  example  in  a  separate  class. 
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Now  we  describe  the  results  of  the  first  experiment,  which  computes  the  number  of  QR 
steps  needed  for  convergence  with  relative  error  tolerance  tol  =  lOOe  %  lO"^*.  This  is 
the  number  m  in  the  statement  of  Theorem  6.1.  Actually,  we  compute  a  related  quantity 
which  is  more  closely  related  to  the  actual  work  done:  the  number  of  "QR  inner  loops" 
divided  by  n{n  +  l)/2,  where  n  is  the  matrix  dimension  and  one  "QR  inner  loop"  is  one 
pass  through  the  inner  loop  of  the  QR  algorithm  (shifted  or  unshifted).  The  reason  for 
choosing  this  statistic  is  as  follows.  The  usual  rule  of  thumb  for  the  number  of  QR  steps 
it  takes  to  compute  the  SVD  is  two  steps  per  singular  value  [ParSO].  If  convergence  always 
takes  place  at  the  end  of  the  matrix,  this  means  there  will  be  2  steps  on  a  matrix  of  length 
i,  for  t  =  n,  n  —  1, . . .  ,3  (two  by  two  matrices  are  handled  specizdly).  Thus,  since  one  QR 
step  on  a  matrix  of  length  i  consists  of  i  "QR  inner  loops",  we  expect  an  average  of  about 
n(n+  1)  "QR  inner  loops"  for  the  entire  SVD.  Thus,  the  quantity  "QR  inner  loops"  divided 
by  n{n  +  l)/2  should  be  a  measure  of  the  difficulty  of  computing  the  SVD  of  a  matrix 
which  is  independent  of  dimension,  and  we  expect  it  to  equaJ  2  on  the  average.  For  each  of 
the  twelve  problem  classes,  and  for  both  the  algorithm  of  section  3  and  the  standard  SVD 
algorithm  [BDMS79],  the  minimum,  average  and  maximum  of  this  statistic  is  presented  in 
Table  1. 


Table  1:  QR  inner  loops  /  (n( 

n  +  l)/2) 

Class 

Standard  SVD 

N 

ew  SVD 

Min 

Avg 

Max 

Min 

Avg 

Max 

1 

.60 

.90 

1.33 

.09 

.49 

1.11 

2 

.60 

1.94 

3.07 

.09 

.49 

1.11 

3 

.61 

.85 

1.19 

.56 

.82 

1.19 

4 

.32 

1.04 

1.80 

.35 

.60 

1.04 

5 

.07 

.45 

1.11 

.09 

.57 

1.42 

6 

.07 

.40 

.93 

.09 

.57 

1.42 

7 

.10 

1.32 

2.31 

.10 

1.04 

1.85 

8 

.41 

.64 

.95 

.26 

.49 

.77 

9 

.79 

.94 

1.29 

.57 

.75 

.93 

10 

1.07 

1.29 

1.57 

1.04 

1.22 

1.48 

11 

1.97 

2.26 

2.52 

2.06 

2.20 

2.41 

12 

1.53 

1.53 

1.53 

2.96 

2.96 

2.96 

Since  Class  11  corresponds  to  matrices  with  uniform  random  entries,  we  see  that  the 
rule  of  thumb  of  2  QR  steps  per  singular  value  is  justified.  In  the  worst  case.  Class  12,  the 
new  SVD  algorithm  takes  3  QR  steps  per  eigenvalue.  In  the  other  classes  it  takes  many 
fewer  step  to  reach  convergence.  Thus,  in  practice  we  can  bound  the  number  of  QR  steps 
by  m  =  3n  in  order  to  obtain  an  upper  bound  depending  only  on  the  matrix  dimension  n 
in  Theorem  6.1.  Of  course,  after  the  algorithm  has  been  run  m  is  easily  available,  so  that 
Theorem  6.1  could  be  used  to  get  less  pessimistic  bounds. 

The  second  set  of  experiments  measured  ||A/(j,  0)||oo  for  the  same  test  cases  as  above. 
We  computed  M(j,  0)  as  follows.  The  first  order  perturbation  theory  in  Lemmas  5.2  and 
5.4  can  be  seen  as  computing  the  linear  operator  M{i  +  1,  i)  which  maps  the  relative  errors 
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(a  ,  •  • -.fon'^b, >  ■  • -(fkn-i  '^  ^■^^  entries  of  the  bidiagonaJ  matrix  B  to  the  relative  errors 
€»,...,  fa'  ,  Cj,' ,...,  €{,'  in  the  entries  of  the  bidiagonaJ  matrix  B'  after  one  zero-shift  QR 
step.  The  entries  of  this  matrix  are  computed  a5  products  of  the  2  by  2  matrices  appearing 
in  the  proofs  of  Lemmas  5.2  and  5.4,  and  so  the  entries  of  M{i  +  l,t)  are  complicated 
polynomials  in  the  sines  of  cosines  of  rotation  angles  occuring  during  the  running  of  the 
algorithm.  We  obtain  M(j,0)  =  M{j,j  -  1)  •  M(1,0)  via  matrix  multiplication. 

The  experiments  were  performed  by  taking  each  one  of  the  105  test  matrices  and  running 
zero-shift  QR  until  M(:  +  \,i)  had  converged  to  its  asymptotic  value,  and  the  graph  of 
||A/(j,  0)||oo  versus  j  had  converged  to  a  straight  line;  this  convergence  was  determined  by 
examining  the  graph.  (This  is  not  the  same  as  computing  all  the  M{j,  i)  arising  during  the 
running  of  the  overall  hybrid  algorithm  on  the  test  cases,  but  is  nonetheless  a  thorough  test 
of  our  predicted  upper  bound  (8ti  -  4)(j  -  i)  -(-  0(1)  of  Theorem  9.23  on  ||A/(j,  i)l|oo-)  For 
each  test  matrix  the  computed  values  of  ||A/(j,0)||oo  were  analyzed  as  follows: 

1-  Let  jijnAx  be  the  number  of  QR  steps  taken  and  n  the  matrix  dimension. 

2.  Let  s  =  (||A/(j„uuc,0)||oo  -  ||A/(jniAx-i,0)||oo)/n  be  the  asymptotic  rate  of  growth  of 
||Af(j,0)|U  (divided  by  n). 

3.  Let  r  =  maxi<j<j„„(||M(j,0)||oo  -  n  ■  s  ■  j).  Then  for  all  1  <  j  <  jniax  we  have 
||A/(j,  0)||oo  <  nsj  +  T.  In  other  words,  the  line  nsj  +  r  is  the  tightest  affine  upper 
bound  to  ||A/(;,0)||oo. 

4.  Let  i  =  tnaxi <;<;„,,  (n5j  +  r  -  ||A/(j,0)||oo)  be  the  maximum  amount  the  straight  line 
nsj  +  r  overestimates  |lA/(j,0)||oo. 


Table 

2:  Growth  Statistics 

for  \\M(j 

,0)||oc 

Class 

Jmax 

max  5 

maxr 

maxt 

min 

max 

1 

5 

20 

2.00 

-10.00 

.26 

2 

10 

80 

2.00 

-.02 

1.23 

3 

10 

80 

2.18 

-8.67 

4.63 

4 

20 

90 

1.86 

-7.16 

3.34 

5 

20 

90 

2.10 

-1.00 

16.02 

6 

20 

90 

2.10 

1.00 

18.00 

7 

20 

90 

4.01 

-19.00 

18.00 

8 

30 

30 

5.06 

24.73 

151.28 

9 

40 

40 

4.00 

18.16 

59.93 

10 

40 

40 

3.41 

26.29 

49.11 

11 

40 

40 

4.15 

8.71 

27.58 

Table  2  summarizes  the  values  of  s,  r  and  t  computed.  Columns  2  and  3  give  the 
minimum  and  maximum  values  of  jniax  for  the  given  class.  Columns  4,  5  and  6  give  the 
maximum  values  of  r,  s  and  t,  respectively,  in  each  claiss. 

The  largest  overshoot  151.28  corresponds  to  a  16%  change  in  nsj^n^  +  r,  and  small 
deviation  from  linearity.  Excluding  this  case,  the  maximum  deviation  is  less  than  5%.  So 
even  though  the  analysis  leading  to  the  linear  growth  bound  on  ||A/(j,0)||oo  was  asymptotic, 
we  find  linear  growth  sets  in  quite  early,  much  earlier  than  we  can  currently  explain. 
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The  smallest  values  5  and  r  which  satisfy  nsj  +  r>  nsj  +  r  for  all  105  (5,  r)  pairs  and 
all  j  are  5  =  5.06  and  r  =  0.  This  justifies  the  claims  made  earlier  and  used  in  the  error 
analyses  of  sections  5  and  6. 
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12      Conclusions 

We  have  proven  that  the  singular  values  of  a  bidiagonal  matrix  are  much  less  sensitive 
to  relative  perturbations  in  the  matrix  entries  than  previously  thought:  the  uncertainty  of 
singular  vector  v,  is  proportional  to  the  reciprocal  of  the  relative  gap  mm j^,  |<7| -«Tj|/(ff,+<7j) 
rather  than  the  reciprocal  of  the  absolute  gap  coinj^,  \(t,  —  <yj\/(ymAx-  When  the  matrix  has 
two  or  more  tiny  singular  values,  the  relative  gap  can  be  much  larger  than  the  absolute  gap, 
so  the  bound  is  much  tighter. 

We  have  also  shown  that  the  algorithm  in  [DK88]  is  capable  of  computing  the  singular 
values  to  this  higher  accuracy.  The  proof  involves  a  new  analysis  of  the  stopping  criterion,  as 
well  as  showing  that  rounding  errors  during  the  zero-shift  QR  algorithm  accumulate  slowly. 
This  latter  analysis  is  facilitated  by  associating  to  zero-shift  QR  a  Hamiltonian  differential 
equation  which  interpolates  the  iterates  of  the  algorithm.  In  contrjist  to  many  eigenvalue 
algorithms  where  the  underlying  Hamiltonian  structures  are  Lie-Poisson  structures  (see, 
e.g.,  [DLNT86,DLT89]),  here  the  underlying  structure  is  a  so-called  Sklyanin  structure.  The 
canonical  variables  on  the  appropriate  symplectic  leaves  for  the  bidiagonal  case  turn  out  to 
be  linear  combinations  of  logarithms  of  the  matrix  entries.  The  differential  equation  shows 
that  these  canonical  variables  are  relatively  insensitive  to  changes  in  the  initial  conditions. 
Since  the  canonical  variables  are  essentially  logarithms  of  matrix  entries,  this  means  that 
the  logarithms  of  the  matrix  entries  are  insensitive.  This  in  turn  means  that  small  relative 
errors  (such  as  rounding  errors)  in  the  matrix  entries  grow  slowly. 

This  use  of  a  differential  equation  suggests  the  following  paradigm  for  understanding 
the  attainable  accuracy  of  eigenvalue  algorithms:  given  a  Poisson  structure  in  which  a  given 
eigenvalue  algorithm  is  Hamiltonian,  one  should  try  to  construct  natural  global  canonical 
variables  (i.e.  a  global  Darboux  coordinate  system).  Such  variables  would  indicate  which 
functions  of  the  eigenvalues  (in  our  case,  their  logarithms)  are  relatively  insensitive  to 
appropriate  perturbations  in  the  matrix  (in  our  case,  relative  perturbations  in  the  entries) 
and  hence  are  computable  to  high  accuracy. 

The  algorithm  described  here  will  be  part  of  the  LAPACK  linear  algebra  library  for 
supercomputers  [DDDC*87]. 
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